Gaze Tracking by Using Factorized Likelihoods Particle Filtering and Stereo Vision

Size: px
Start display at page:

Download "Gaze Tracking by Using Factorized Likelihoods Particle Filtering and Stereo Vision"

Transcription

1 Gaze Tracking by Using Factorized Likelihoods Particle Filtering and Stereo Vision Erik Pogalin Information and Communication Theory Group Delft University of Technology P.O. Box 5031, 2600 GA Delft, The Netherlands Abstract In the area of visual perception research, information about a person s attention on visual stimuli that are shown on a screen can be used for various purposes, such as studying the phenomenon of human vision itself or investigating eye movements while that person is looking at images and video sequences. This paper describes a non-intrusive method to estimate the gaze direction of a person by using stereo cameras. First, facial features are tracked with particle filtering algorithm to estimate the 3D head pose. The 3D gaze vector can be calculated by finding the eyeball center and the cornea center of both eyes. For the purpose mentioned above, we also proposed a screen registration scheme to accurately locate a planar screen in world coordinates within 2 mm error. With this information, the gaze projection on the screen can be calculated. The experimental results indicate that an average error of the gaze direction of about 7 could be achieved. Keywords: Gaze tracking, facial features tracking, particle filtering, stereo vision. 1 INTRODUCTION An eye gaze tracker is a device that estimates the direction of the gaze of human eyes. Gaze tracking can be used for numerous applications, ranging from diagnostic applications such as psychological and marketing research to interactive systems in the Human-Computer Interaction (HCI) domain ([4], [17]). For example, studying eye movements during reading can be used to diagnose reading disorders. Investigating the user s attention on advertisements can help to improve their effectiveness. In the HCI domain, gaze tracking can be used as a way to interact with machines, e.g. as a pointing device for disabled people when operating a computer or as a support system in cars to alert users when they fall asleep. Several commercial gaze tracking products exist that are highly accurate and reliable. They are mostly based on a socalled infrared technique. Tobii [18] and ERT [6] developed a system that uses a motorized camera and infrared lighting to track the eye gaze. Their products are mainly used for visual perception research. Other companies such as Fourward [8] and ASL [1] use head-mounted cameras to track the user s eyes from a close distance. These kinds of products are suitable for user interaction as well as visual perception research. There are two disadvantages which make those infraredbased gaze tracking products less attractive for wide use. Most of these products require special hardware such as motorized cameras, helmets or goggles, making the product really expensive (between US$15,000 and US$150,000 as reported in [19]). Furthermore, this special hardware can cause discomfort and will restrict the user s movements. In this paper, we designed a gaze tracking scheme in the framework of visual perception research. In a typical experiment, users are told to watch visual stimuli that are displayed on a screen [4]. Their gaze projection on the text, image or video sequence shown on the screen can be used for various purposes, such as diagnosing reading disorders, analyzing the effectiveness of advertisements and investigating differences of their attentions while evaluating the image quality of a video sequence. Considering these applications and the two disadvantages mentioned above, we summarized the following requirements as guidelines during the system design: The system should detect and track the user s gaze on a 2D screen by estimating the intersection point between the gaze ray and the screen. The system must use a non-intrusive technique. The system should track a single user at a time. The system does not have to work in real-time. The system should be made as cheap as possible and it should be possible for the system to be used for userinteraction purposes. The average angular gaze error should not exceed 5. 1

2 Inspired by the works of Matsumoto et al. [15] and Ishikawa et al. [12], which used a completely non-intrusive method to estimate the gaze directions in 3D, we make another contribution to this type of solutions by introducing some modifications to their method. Our tracking scheme combines the auxiliary particle filtering algorithm ([16]) and stereo information to detect and track facial features, such as eye and mouth corners. The 3D locations of these features determine the pose of the head. Furthermore, we use a 3D eye model which assumes that the eyeball is a sphere. Unlike Ishikawa et al., we choose to use the corners of the eye socket instead of the corners located on the eyeball surface. This would make the tracking more robust to occlusions and eye blinks. Finally, we devised a screen registration scheme to locate a 2D surface that is not visible in the camera view (such as a monitor positioned behind the camera) by using a special mirror. In this way, the screen location in the world coordinate system is known accurately so that we can directly calculate the intersection of the gaze ray with the screen. Beside the screen we could also register other objects in the world coordinate system as well. With minor modifications the system could be easily applied for user-interaction purposes. This paper is organized as follows. In section 2 we present a short summary of the work that has been done previously in eye gaze tracking. The outline of our gaze tracking system is presented in section 3. In section 4 we will discuss the calibration of the cameras and the registration of the 2D screen. Next, the two most important modules of the system, the head pose tracking and the gaze direction estimation, will be described in section 5 and 6, respectively. The system performance is evaluated and the results are given in section 7 and finally, section 8 will conclude this paper with a discussion and recommendations for future work. 2 PREVIOUS WORK In the last few years, gaze tracking research is concentrated on intrusive as well as non-intrusive video-based techniques. Using image processing and computer vision techniques, it is possible to compute the gaze direction without the need for any kind of physical contact with the user. The most popular technique is the use of infrared lightings to capture several reflections from parts of the eye (pupil, cornea, and lens reflections) [4]. The relative position of these reflections changes with pure eye rotation, but remains relatively constant with minor head movements. With appropriate calibration procedures, this method estimates the user s point of regard on a planar surface (e.g. PC monitor) on which calibration points are displayed. Several variations to interpolate the gaze from known calibration points have been reported in the literature, including the use of artificial neural networks ([2], [5], [13]). This infrared technique is widely applied in current commercial gaze trackers. However, it needs a high resolution image of the eye, which explains the use of expensive hardware, such as a zoom-capable camera mounted below the screen or attached to a helmet. Another approach that has been developed recently detects the head pose separately and uses this information to estimate the gaze direction in 3D. This method has several advantages compared to the infrared technique. Aside from the cheap hardware requirements (a pair of normal cameras and a PC), tracking is not only restricted to the point of regard on a planar object. Since the gaze is tracked in a 3D world, we can also intersect the gaze with other objects of interest as well, provided that those objects are properly registered in the 3D world (i.e. the locations are accurately known). Because of this, the system can be easily modified for interaction purposes. Matsumoto et al. [15] used stereo cameras to detect and track the head pose in 3D. A 3D model for each user is built by selecting several facial features in the initialization phase. This 3D pose will be rigidly tracked over time. To measure the gaze direction, the location of the eyeball center is calculated from the head pose and the cornea center is extracted from the stereo images. The vector that connects the eyeball center and the cornea center is the estimate of the gaze direction. The use of Active Appearance Models (AAM) has been proposed by Ishikawa et al. [12]. A 3D AAM is fitted to the user s face and tracked over time by using only a single camera. Similar steps as in [15] are done to measure the 3D gaze vector. Another camera is used to view the scene and by asking the user to look at several points in the world, the relative gaze orientation with respect to the projection of these points in the view-camera image can be interpolated. This paper makes another contribution to the 3D gaze tracking method. The pose of the head will be tracked by using the particle filtering algorithm proposed in [16]. Combined with stereo vision, 3D head pose can be recovered. We use a slightly different eyeball model than the model used in [12] and [15]. Since visual perception research is our main concern, we also devise a screen registration scheme to locate a planar screen with respect to the cameras. With this information, the gaze projection on the screen can be calculated. 3 SYSTEM OUTLINE Our gaze tracking system consists of three main modules: head pose tracking, gaze direction estimation and intersection calculation (figure 1). We use a 3D facial feature model to determine the 3D pose of the head. Together with a 3D eye model the 3D gaze vector can be determined. Figure 2 shows the hardware setup of the system. A pair of USB cameras placed below the monitor is used to capture the user in the 2

3 5 J = H J 1 E J E = E = J E 5 J A H A + = A H = + = E > H = J E 5? H A A 4 A C E I J H = J E!,. =? E =. A = J K H A!, - O A 0 A 2 I A 6 H =? E C / = A, E H A? J E - I J E = J E, 5 J = H J 2 I E J E I. =? E =. A = J K H A 6 A F = J A I!, 0 A 2 I A 7 I A H 6 H = E E C 9 5? H A A. H = A 6 H = I B H = J E 1 J A H I A? J E + =? K = J E!, / = A 8 A? J H, 5? H A A + E = J A I Figure 1. Block diagram of the gaze tracking system. The left part shows the off-line steps that have to be done before the actual tracking is performed. Figure 2. Hardware setup of the gaze tracking system. A pair of USB cameras placed below the monitor are used to capture the user in the scene. scene. Several pre-processing steps must be done before performing the actual tracking. First of all, the stereo cameras must be calibrated. In the calibration process, the left camera reference frame is used as the world reference frame. Secondly, we need to register the screen position in world coordinates. In this way, after calibrating the cameras and the screen, we can directly compute the intersection of the gaze ray with the screen plane. The calibration procedure will be discussed in detail in section 4. The third and last step is to estimate the user-dependent parameters for the 3D facial feature model and 3D eye model. The facial feature model is built by taking several shots of the head under different poses. The eye model is created by acquiring a training sequence, where the user looks at several calibration points on the screen. The estimated parameters will be used for the actual tracking. We refer to section 6 for more details on the eyeball model used. The head pose tracking (section 5) is initialized manually in the first frame received by the cameras. In this initialization phase, we choose the facial features that we want to track and use the image coordinates of these features (in the left and right frame) as start positions for the head pose tracking. In our system, the corners of the eyes and mouth are selected. A rectangular color window defined around each chosen feature will be used as reference template. These facial features will be tracked throughout the whole video stream by using the particle filtering algorithm proposed in [16]. The system then performs stereo triangulation on each facial feature. The output of this module is the 3D locations of all features, which determines the pose of the head in the current frame. Once we know the 3D location of the eye corners, the location of the eyeball center can be determined (see section 6). A small search window is defined around the eye corners to search for the cornea center in the left and right frame. The 3D location of the cornea centers are found by triangulation. The gaze is then defined by a 3D vector connecting the eyeball center and the cornea center. Two gaze vectors are acquired from the gaze direction estimation module, one from the left and one from the right eyes. The last step is to intersect the gaze vectors from the left and right eye with the object of interest (e.g. the monitor screen). The intersection is done by extending this vector from the eyeball center until it reaches the screen. To compensate for the effect of noise, we take the average of the left and right projected gaze points and feed the single 2D screen coordinate to the output. In the following sections, each module of the system will be discussed in more detail. 4 CAMERA CALIBRATION This section discusses the calibration of the cameras and the registration of the 2D screen. The results are the intrinsic parameters of the cameras and the extrinsic parameters of the cameras and the screen (i.e. the relative position of the cameras and the screen with respect to the world reference frame). In section 4.1 we deal with the calibration of the stereo cameras, followed by the screen registration in section Calibrating Stereo Cameras Camera calibration is done by using the method proposed by Zhang [20]. This method only requires the cameras to observe a planar checkerboard grid shown at different orientations (figure 3). In the following we describe the calibration notations that will be used in the remaining sections. A 2D point is denoted by x = [u v] T and a 3D point by X = [x y z] T. We use x, X to denote the homogeneous 3

4 N O N O N H H O M Figure 3. The setup used for stereo camera calibration. The origin of the camera frame is located on the pinhole of the camera. The left camera frame is also used as the world frame (w: world, l: left camera, r: right camera and g: calibration grid). coordinates of a 2D and a 3D vector, respectively. A pinhole camera model is used with the following notation: λ x im = KX c, X c = RX w + T (1) which relates a 3D point X w = [x w y w z w ] T in the world reference frame with its image projection x im = [u v 1] T in pixels, up to a scale factor λ. The matrix K, called the camera or calibration matrix, is given by K = f x αf x u 0 0 f y v and contains the intrinsic parameters: the focal lengths f x and f y, the coordinates of principal points (u 0, v 0 ) and the skewness of the image axes α. The same 3D point X w can be represented in the camera reference frame by X c = [x c y c z c ] T, which is related by a 3x3 rotation matrix R and 3x1 translation vector T. This frame transformation can also be written as a single matrix: [ ] R T M = 0 1 In this paper we use the left camera frame as the world frame, so for the left and right camera we would have: C and r = x 2 + y 2. The undistorted coordinate x ud is defined as follows [3]: where x ud = D r x d + D t (3) D r = [ 1 + k 1 r 2 + k 2 r 4 + k 5 r 6] [ 2k3 xy + k 4 (r 2 + 2x 2 ] ) D t = k 3 (r 2 + 2y 2 ) + 2k 4 xy are the radial and tangential distortion coefficients, respectively. These coefficients can be represented by a single vector k = [k 1 k 2 k 3 k 4 k 5 ] T. Finally, equation (1) can be modified to include the distortion model: x im = K x ud (4) To estimate the intrinsic and extrinsic camera parameters, the following steps are taken: Acquiring stereo images Position the two cameras so that an overlapped view of the user s head is achieved. Take a series of images of the calibration grid (figure 4) under different plane orientations. Extracting the grid reference frame For each plane orientation, the four intersection corners of the pattern are chosen manually (the white diamonds in figure 4). The inner intersections will be detected automatically by estimating planar homography between the grid plane and its image projection [20]. All detected intersection points are then refined by the Harris corner detector [3] to achieve sub-pixel accuracy. From each image, we will get the image coordinate of each intersection point x im and its coordinate in the grid reference frame X g = [x g y g 0] T. 50 O g Extracted corners X l = M wl X w, M wl = I [ 4 4 X r = M wr X w Rwr T, M wr = wr 0 1 ] (2) = M lr v 100 where I N N is an identity matrix of size N N and M lr denotes the extrinsic parameters of the stereo cameras, the transformation between the left and right camera frame. We used a lens distortion model that incorporates the radial and tangential distortion coefficients. Let x d be the normalized and distorted image projection in camera reference frame: [ x c /z c ] [ ] x x d = y c /z c = y u Figure 4. The extracted intersection points from a calibration grid. The four intersection corner points are chosen manually (white diamonds), while the inner points are automatically extracted by using plane homography. 4

5 N N O O O N O N Estimating individual camera parameters The intrinsic parameters and the distortion coefficients of each camera are estimated by minimizing the pixel reprojection error of all intersection points on all images, in the least square sense. The initial guess for the parameters is made by setting the distortion coefficients to zero and choosing the centers of the images as the principal points. The initial focal lengths are calculated from the orthogonal vanishing points constraint [3]. Estimating the parameters of both cameras The individually optimized parameters for each camera from the previous step are now used as initial guess for the total optimization (considering both cameras). At the end we get the optimized distortion coefficients of both cameras, the calibration matrix for each camera and the external parameters relating the two cameras. 4.2 Registering the Screen to the World Frame In order to intersect the gaze vector with the screen, the screen location with respect to the world frame must be determined. In other words, we need to determine the transformation M ws from the world frame to the screen frame. We use the following method to estimate this transformation. A mirror is placed in front of the camera to capture the reflection of the screen. The camera will perceive this reflection as if another screen is located at the same distance from the mirror but in the opposite direction (see figure 5). We attached a reference frame to each of the objects: O w, O m, O v and O s for the world, mirror, virtual screen and the real screen frame, respectively. If we know the location of the mirror and this virtual screen, then we can also calculate the location of the real screen. By taking three co-planar points on the screen in world coordinates (e.g. the points that lie on the XY-plane of the screen), we get the first two orthogonal vectors that define the screen reference frame. The third one can be computed by taking the cross product of these two vectors. I C I I H E C I I D H J M M I M M L M L C L H E C L L I D H J Figure 5. The hardware setup used for the screen registration. The stereo cameras are represented by two ellipses in front of the screen. Each object is shown with its own reference frame (w: world, m: mirror, v: virtual screen and s: screen) Z O m X Image points (+) and reprojected grid points (o) Y Z X O v Figure 6. The mirror used for the registration of the screen. A part of the reflection layer is removed, so that the camera can see the calibration pattern put behind the mirror. Compare the extracted reference frame with figure 5. By displaying a calibration pattern on the screen, the virtual screen-to-world frame transformation M vw can be computed from the reflection of that pattern. With this information, we can choose three co-planar points and calculate their 3D world coordinates vorig w, vw long and vw short (figure 5). Then, applying the following transformations to each of these points will result in the corresponding 3D screen points s w orig, sw long and sw short in world coordinates: s w i = M mw Y M wmṽ w i (5) In equation (5) the virtual points are first transformed to the mirror coordinates via M wm. The second matrix will mirror the points to the opposite direction of the mirror s XY-plane. After that, by multiplying it again with the inverse transformation M mw we get the screen points in world coordinates s w i Ṫo determine the location of the mirror, a part of the mirror s reflection layer is removed, making that part transparent. A calibration pattern is placed behind the glass. For the calculation of the world-to-mirror frame transformation M wm, the grid frame extraction from section 4.1 must be slightly modified. Instead of extracting intersection points from the whole grid only the points on the grid border need to be detected (figure 6). The last step is to determine M ws from the calculated screen points. The rotation and translation component of the transformation can be determined as follows: s xaxis = s w long s w orig s yaxis = s w short s w orig s zaxis = s w xaxis s w yaxis 5

6 N R ws = [ŝ xaxis ŝ yaxis ŝ zaxis ] T T ws = R ws s w orig (6) F N F with ŝ i as the normalized version of s i. Since the camera calibration is only accurate for the space where the calibration grid is positioned, we need to acquire two sets of images. In the first set, we take into account the space where the user s head and also the mirror are supposed to be located (about cm in front of the camera). For the second set, we place the calibration grid on the estimated location of the screen reflection (the virtual screen), about cm away from the camera. The calibration is then performed over the joint set of images. After that, the screen registration described above can be carried out. 5 HEAD POSE TRACKING In this section the head pose tracking module will be discussed in detail. First, a short summary of the particle filtering algorithm is provided in section 5.1, followed by the description of the factorized likelihoods particle filtering scheme proposed in [16] (section 5.2). The 3D facial feature model that is used in our scheme is described in section 5.3. Finally, in section 5.4 we discuss the role of particle filtering in the head tracking module and propose the use of stereo information as prior knowledge for the tracking. The choice of particle filtering parameters will also be discussed here. 5.1 Particle Filtering Recently, particle filtering has become a popular algorithm for visual object tracking. In this algorithm, a probabilistic model of the state of an object (e.g. location, shape or appearance) and its motion is applied to analyze a video sequence. A posterior density p(x Z) can be defined over the object s state, parameterized by a vector x, given the measurements Z from the images up to time t. This density is approximated by a discrete set of weighted samples, called the particles (figure 7). At time t, this set is represented by {s k, π k } which contains K particles s 1, s 2,..., s K and their weights π 1, π 2,..., π K (for easier notation, we remove the time index). The main idea of particle filtering is to update this particlebased representation of the posterior density p(x Z) recursively from previous time frames: p(x Z) p(z x)p(x Z ) p(x Z ) = x p(x x )p(x Z ) where the superscript denotes the previous time instant. See [10] for the complete derivation of this equation. Beginning from the posterior of the previous time instant p(x Z ), a number of new particles are randomly sampled from the set {s k, π k }, which is approximately equal to sampling from p(x Z ). Particles with higher weights will (7) F I Figure 7. An illustration of the particle-based representation of a 1- dimensional posterior distribution. The continuous density is approximated by a finite number of samples or particles s k (depicted by the circles). Each particle is assigned a weight π k (represented by circle radius) in proportion with the value of the observation density p(z x = s k ), which is an estimation of the posterior density at s k. have higher probability to be picked for the new set, while particles with lower weights can be discarded. Next, each of the chosen particles are propagated via the transition probability p(x x ), resulting in a new set of particles. This is approximately equivalent to sampling from the density p(x Z ) (equation (7), second line). In the last step, new weights are assigned to the new particles, measured from the observation density, that is let π k = p(z x = s k ). The new set of pairs {s k, π k } represents the posterior probability p(x Z) of the current time t. Once the new set is constructed, the moments of the state at current time t can be estimated. We can take for instance the weighted average of the particles, obtaining the mean position: E[x] = K π k s k (8) k=1 In our case, we consider a facial feature such as eye or mouth corner as a single object, with the image location as the state. In every time frame, the facial feature location is tracked by evaluating the appearance of the feature. Several problems occur when this algorithm is used to track multiple objects [16]. One of the problems is that propagating each object independently would deteriorate the tracking robustness when there are interdependencies between the objects. By incorporating this information in the tracking scheme, the propagation would be more efficient, i.e. less particles are wasted on areas with low likelihood. For example, if we want to track multiple facial features individually without any information about the relative distance between the features, the rigidness of the face is lost. By introducing some constraints in the propagation of each facial feature, the rigidness of the face is preserved. 5.2 Auxiliary Particle Filtering with Factorized Likelihoods The method summarized below, proposed in [16], is one of the improvements to particle filtering in the case of tracking multiple objects. The state is partitioned x = [x 1 x 2... x M ] T such that x i (i = 1, 2,..., M) represents the state of each 6

7 : ; object and M is the number of objects. Each partition is propagated and evaluated independently: p(x i Z) p(z x i ) x p(x i x )p(x Z ) (9) Similar to the notation in section 5.1, each posterior p(x i Z) is represented by a set of sub-particles and their weights {s ik, π ik }, with k = 1, 2,..., K and K the number of sub-particles. After separately propagating those sets, a proposal distribution is constructed from individual posteriors: g(x) = i p(x i Z). By ignoring the interdependencies between different x i, we can construct the sample s k = [s 1k s 2k... s Mk ] T (concatenation of the sub-particles) by independently sampling from p(x i Z). The individual propagation steps are summarized below. The density p(x Z) now represents the posterior of all objects, instead of only one object. Starting from the set {s k, π k } from the previous time frame the following steps are repeated for every partition i: 1) Propagate all K particles s k via the transition probability p(x i x ) in order to arrive at a collection of K sub-particles µ ik. Note, that while s k has the dimensionality of the state space x, µ ik has the dimensionality of the partitioned state x i. 2) Evaluate the observation likelihood associated with each sub-particle µ ik, that is let λ ik = p(z x i = µ ik ). 3) Sample K particles from the collection {s k, λ ikπ k }. In this way it favors particles with high λ ik, i.e. particles which end up at areas with high likelihood when propagated with the transition probability. 4) Propagate each chosen particle s k via the transition probability p(x i x ) in order to arrive at a collection of K particles s ik. Note that s ik has the dimensionality of the partition i. 5) Assign a weight π ik to each sub-particle as follows, w ik = p(z s ik) λ ik, π ik = w ik j w. ij After this procedure, we have M posteriors p(x i Z) each represented by {s ik, π ik }. Then, sampling K particles from the proposal function g(x) is approximately equivalent with constructing the particle s k = [s 1k s 2k... s Mk ] T by sampling independently each s ik from p(x i Z). Finally, in order for these particles to represent the total posterior p(x Z) we need to assign a weight to each particle equal to [11]: π k = p(s k Z ) i p(s ik Z (10) ) In other words, the re-weighting process favors particles for which the joint probability is higher than the product of the marginals. In the general case that the above equation cannot be evaluated by an appropriate model, the weights need " " " " Figure 8. The 3D facial feature model. On the left the facial feature templates are shown. On the right we see their locations in 3D, calculated from stereo images. The triangle represents the 2D face plane, formed by connecting the average locations of all three feature pairs. to be estimated. Here the use of prior information such as the interdependencies between the objects is utilized. After normalizing the sum to 1 again, we end up with a collection {s k, π k } as the particle-based representation of p(x Z) D Facial Feature Model The facial feature model in our scheme consists of two components: templates of the facial features appearance relative 3D coordinates of the facial features (reference face model) The facial features shown in figure 8 are defined as the corners of the eyes and mouth. This facial feature model is user-dependent and must be built before tracking can be performed. First, a stereo snapshot of the head is taken. From this shot the relative 3D positions of the facial features are extracted by manually locating the features in the left and right images and triangulating those features. Together they form a reference shape model for the user s face. Next, in the beginning of each tracking process (initialization phase), the start positions of the facial features in the left and right frames are selected manually. Simultaneously, a rectangular image template around each feature is acquired. These templates will be used in the tracking process. 5.4 Multiple Facial Features Tracking In this section we will use the auxiliary particle filtering scheme described in the previous section for the problem of multiple facial feature tracking. Figure 9 shows the overview of the head tracking module where the facial features will be tracked. The facial feature templates from the initialization phase are used to track the features in 2D. The output of each of the particle-filtering blocks is a set of particles that represents the distribution of the 2D facial feature locations, for the left and right image respectively: {s k, π k } L and {s k, π k } R. In order to do the re-weighting process in equation (10) we use the reference face model (figure 8) as prior information on the relative 3D positions of the facial features. We combine " " 7

8 5 J = H J F I E J E I B =? E = B A = J K H A J A F = J A I A B J B H = A, 2 = H J E? A B E J A H E C 5 J A H A J H E = C K = J E!, 4 A M A E C D J E A B E J J E C 5 J = H J F I E J E I B =? E = B A = J K H A J A F = J A I 4 E C D J B H = A, 2 = H J E? A B E J A H E C, F = H J E? A I, F = H J E? A I!, F = H J E? A I 9 A E C D J = L A H = C A B!, F = H J E? A I!, 0 A 2 I A!,. =? E =. A = J K H A Figure 9. Block diagram of the head pose tracking module. Particle filtering is used to track the 2D locations of the facial features in the left and right frame. the two particle sets from the left and right image to a set of 3D particles by triangulating each left and right particle (one-to-one correspondence), and compare each 3D particle with the reference face model to calculate the weights π k,3d. These weights are then assigned to the left and right set (π k,l and π k,r ) and the individual propagation for the next frame can start again. From each frame we can roughly estimate the 3D locations of the facial features by calculating the weighted average of the 3D particles (equation (8)). The reference face model is then fitted to these 3D points to refine the estimation of the head pose in current frame. In the following subsections we will describe the choice of the state, the observation model and the transition model used for the 2D tracking. After that, we discuss how the priors are used to take into account the interdependencies between the facial features State and Transition Model We consider each facial feature as an object. For every facial feature i, the object state is represented by x i = [u i v i u i v i ]T, with [u i v i ] and [u i v i ] as the current and the previous 2D image coordinates of a particular feature, respectively. We choose to include the previous image coordinates in order to take into account the object s motion velocity and trajectory. To simplify the evaluation of the transition density, we assume that p(x i x ) = p(x i x i ), which means that each feature can be propagated individually. A second-order process with Gaussian noise is used for individual propagation of each feature: p(x i x i ) 1+α 0 α β 0 β x i + N(0, σ n ) (11) with α, β [0, 1] as the weight factor that determines the strength of the contribution of the horizontal and vertical motion velocity of a particle in the transition model Observation Model After the 2D particles are propagated in step 1 and 4 from section 5.2, the weight of each sub-particle needs to be determined. This is done by evaluating the observation likelihood p(z x i ). We use the same observation model as proposed in [16]. A template-based method is used as measurement z from the images. The color difference between a reference template and an equally sized window centered on each sub-particle is used as a measure of the weight of the particles, that is the probability of a sub-particle being the location of a facial feature. Let the reference template be r i, and let the window centered on a sub-particle be o i. The color-based difference is then defined as [16]: c(o i, r i ) = ( oi E{o i,y } ) r i E{r i,y } (12) where the subscript Y denotes the luminance component of the template and E{A} is the mean of all elements in A. The matrix c(o i, r i ) contains the RGB color difference between o i and r i. Finally, the scalar color distance between those two matrices is defined by: d(o i, r i ) = E ( ) ρ c( ) = { ( ) } ρ c(o i, r i ) (13) c( ) R + c( )G + c( )B where ρ( ) is a robust function defined as the L 1 -norm of the color channels per pixel. The observation likelihood is defined as: p(z x i ) ε o + exp ( d(o i, r i ) 2 ) (14) 2σ 2 o where σ o and ε o are the model parameters (see figure 10). The parameter σ o determines the steepness of the curve, that is, how fast the curve will drop in the case of bad particles (i.e. particles that have low similarity with the reference template). The parameter ε o is used to prevent that particles get stuck on local maxima when the object is lost. To improve the ability to recover from a lost object, ε o should be small but non-zero [14] Priors To approximate the re-weighting process as defined in equation (10), we use similar approach as in the calculation of the observation likelihood. The prior information on the 3D relative positions of the facial features is now used. 8

9 - + + H p(z x) 0.6 σ L C ε d(.) - Figure 10. The observation model. - After we get the new particle sets from the left and right images, {s k, π k } L and {s k, π k } R, we combine these sets (oneto-one correspondence) by triangulating each particle pair, resulting in K 3D particles. The weight of each 3D particle π k is then approximated by: ( ) π k = ε p + exp d2 k 2σp 2 (15) where σ p and ε p are the model parameters similar to those for the observation likelihood (equation (14)) and d k is the difference between the reference face shape and the shape derived from each 3D particle. To calculate this difference, the reference face shape is first rotated such that its face plane (figure 8) coincides with the face plane of the measured shape. The scalar distance d k is then defined as: d k = 1 M d 2 ik (16) M i=1 where d ik is the 3D spatial distance between feature i of the reference and feature i of the k th 3D particle. 6 GAZE DIRECTION ESTIMATION After acquiring the estimation of the head pose in the previous section, we will now discuss the gaze direction estimation module and the intersection calculation module in detail. We begin with presenting the geometrical eye model used in our system in section 6.1. In section 6.2 the calculation of the 3D gaze vector will be explained. Finally, the intersection between the gaze ray and the screen will be dealt with in section Geometrical Eye Model We use a 3D eyeball model similar to the model used by Matsumoto et al. [15] and Ishikawa et al. [12]. The eyeball is regarded as a sphere with radius r and center O (figure 11). We assume that the eyeball is fixed inside the eye socket, except for rotation movements around its center. Therefore the relative position of the center O and the eye corners is constant regardless of the head movements. Unlike Ishikawa Figure 11. The eyeball model used in our system. The capital letters denote 3D points in world coordinates. The gaze direction ˆv g is defined as a 3D vector from the eyeball center O pointing to the cornea center C. The points E 1 and E 2 are the inner and outer corners of the eye socket. et al., we also assume that the inner and outer corners of the eye socket (E 1 and E 2 ) are not located on the eyeball surface. It is easier to locate and track the eye corners than the points on the eyeball surface, because these corners are more distinctive (figure 11). This would also make the tracking more robust to eye blinks. Furthermore, we assume that the anatomical axis of the eye coincides with the visual axis 1. The gaze direction is defined by a 3D vector going from the eyeball center O through the cornea center C. Our 3D eyeball model consists of two parameters: the radius of the eyeball r. the relative position of the eyeball center with respect to the eye corners. The relative position of the eyeball center is defined as a 3D vector from the mid-point of the eye corners M to the eyeball center O, and termed as an offset vector d. These parameters are determined for each person by taking a training sequence where the gaze points of that person is known. The training sequence is acquired by recording the user s head pose and cornea center locations while he is looking at several calibration points on the screen. Since we know the locations of the calibration points, we can calculate the gaze vector to these points. If we consider only one calibration point P, the gaze vector is determined by v g = P C, ˆv g = v g v g 1 Anatomical axis is defined as the vector from eyeball center to the center of the lens, while visual axis is defined as the vector connecting the fovea and the center of the lens. The visual axis represents the true gaze direction. On the retina, the image that we see will be projected at the fovea, which is slightly above the projection of the optical axis. 9

10 with ˆv g as the normalized gaze vector when the eye gaze is fixed to point P (see figure 11). The relation between the gaze vector and the unknown parameters r and d is reflected by the equation: d + rˆv g = C M (17) This equation cannot be solved because we have 4 unknowns (the radius r and the offset vector d = [d x, d y, d z ]) and 3 equations (one for each x, y and z component). If we combine the left and right eye together, assuming the same eyeball radius, we would still have 7 unknowns and 6 equations. Therefore, we need at least 2 calibration points to estimate the eyeball parameters for each user. The generalized matrix equation for N calibration points can be derived from equation (17), written in the form Ax = b: ˆv gl,1 I 0 ˆv gl,2 I 0. ˆv gl,n I 0 ˆv gr,1 0 I ˆv gr,2 0 I. ˆv gr,n 0 I r d L d R = C L,1 M L,1 C L,2 M L,2. C L,N M L,N C R,1 M R,1 C R,2 M R,2. C R,N M R,N (18) Solving this matrix equation in the least square sense leads to the desired eyeball parameters. Note that the calculation is done in the face coordinate system (see figure 8), otherwise equation (18) would not be valid. 6.2 Estimating the Gaze Vector Once the eyeball parameters are estimated, we can estimate the gaze direction. The overview of the gaze direction estimation module is given in figure 12., F H A? J E, A B J A O A? H A H I A B J H E C D J B H = A + H A A J A? J E!, A B J? H A =? A J A H Figure 12. / = A? =? K = J E!, A B J A O A? H A H I - O A > =? A J A H? =? K = J E!, A B J A O A > =? A J A H!, A B J C = A L A? J H!, D A F I A!, 4 E C D J A O A? H A H I - O A > =? A J A H? =? K = J E!, 4 E C D J A O A > =? A J A H / = A? =? K = J E!, 4 E C D J C = A L A? J H, F H A? J E, 4 E C D J A O A? H A H I A B J H E C D J B H = A + H A A J A? J E!, 4 E C D J? H A =? A J A H!, A O A > A Detailed block diagram of the gaze direction estimation module. Figure 13. The ROI defined between the inner and outer eye corners. The small dot in the middle of the circle represents the 2D cornea center. From the head pose tracking module we get the 3D locations of all facial features. However, for gaze direction estimation we only need the 2D and 3D positions of the inner and outer eye corners (for the left and right eye). This information is used to estimate the cornea center and the eyeball center locations Finding the Eyeball Center We calculate the location of the left and right eyeball center separately by using the following equation: O = 1 2 (E 1 + E 2 ) + d = M + d (19) where d is the offset vector obtained from the training sequence Finding the Cornea Center To find the cornea center we first project the 3D eye corners back to the left and right 2D image plane. A small ROI in the image is then defined between the inner and outer corner locations (figure 13). Then, template matching with a disk-shaped template on the intensity image is used to approximately locate the cornea. After that, we define an even smaller ROI around the initial cornea center location and apply the circular Hough transform on the edge image of the smaller ROI. The second ROI is used to filter out irrelevant edges. The pixel position with the best confidence (most votes) is the estimation of the cornea center. The steps described above are done for the left and right image separately. The left and right 2D cornea center locations are then triangulated to find the 3D location The 3D Gaze Vector After finding the 3D cornea center location C and the 3D eyeball center O for the left and right eye, the gaze vector for the current frame is then calculated by v g = C O, ˆv g = v g v g (20) The normalized left and right gaze vectors are finally forwarded to the intersection calculation module (see figure 12). 6.3 Intersecting Gaze Vector with the Screen The overview of the intersection calculation module is shown in figure 14. To intersect the gaze ray with the screen we need the information about the screen location. In figure 15, the gaze direction is projected on the screen in point P. The 10

11 2!, A B J C = A L A? J H!, 4 E C D J C = A L A? J H 8 A? J H F = A E J A H I A? J E A B J C = A F H A? J E ) L A H = C A 8 A? J H F = A E J A H I A? J E, 5? H A A? E = J A 4 E C D J C = A F H A? J E 9 J I? H A A B H = A J H = I B H = J E 7 EXPERIMENTAL RESULTS In this section we evaluated the performance of each module of the gaze tracking system. The calibration and screen registration results are presented in section 7.1. Section 7.2 discusses the tracking performance of the auxiliary particle filtering. The gaze training and estimation results are shown in section 7.3 and finally, we tested the whole system by applying our system on some sequences in section 7.4. Figure 14. Detailed block diagram of the intersection calculation module. Figure 15. I Illustration of the ray-plane intersection. resulting gaze ray can be written in parametric representation as: g(t) = O + ˆv g t (21) where O is the eyeball center and ˆv g is the unit gaze vector. For a certain scalar t, the gaze ray will intersect the screen at point P. By using the knowledge that the product of every point in a plane with the plane s normal is a constant [9]: N P = N O s = c and the parametric representation of the gaze ray on equation (21), we can obtain the value t P when the gaze ray intersects the screen plane: N (O + ˆv g t P ) = N O s t P = N O s + N O N ˆv g (22) Equation (22) can be further simplified if we do the calculation in the screen coordinate system. We will then have O s = 0 and N = [0 0 1] T, reducing the calculation to a division of two scalars: t P = o z ˆv gz (23) where o z is the z component of the eyeball center (in screen coordinate system). For the output of the whole system, the average of the projected gaze ray from the left and right eye are taken to compensate for the effect of noise. L C 7.1 Stereo Calibration and Screen Registration To calibrate the web camera pair, we took 16 image pairs (320x240 pixels) of the checkerboard calibration grid under various positions. The first eight shots were taken when the grid was held about 50 cm away from the camera. The remaining shots were made while holding the grid about 120 cm away from the camera. Table I shows the estimated camera parameters. (Note that the rotation matrix is only represented by three rotation angles, one for the x, y, and z axis respectively). The results shown in this table indicate that the average horizontal and vertical reprojection error are very small (below 0.1 pixel). The reprojection error remains relatively constant if fewer images than 16 were taken, but this would result in a larger error of the estimated parameters. For the screen registration we took another 5 shots containing the mirror in various positions (figure 16). By using the method described in section 4.2, we could compute the position of the screen with respect to the world frame for each stereo image pair. The estimated world-to-screen transformation M ws for each mirror position is listed in table II. We can see from the standard deviation of the rotation angles and the translation vectors that the screen registration TABLE I STEREO CAMERA CALIBRATION RESULTS Left camera Right camera Intrinsic parameters optimized std. optimized std. Focal lengths f x f y Principal points u v Radial dist. k k Tangential dist. k k Avg. Reproj. error x (pixel) y Extrinsic parameters optimized std. Rotation R α (degree) R β R γ Translation T x (mm) T y T z

12 Figure 17. Example of the head pose tracking with particle filtering. The results presented here were taken from the left camera for frames 1 (user initialization), 51, 86, 122 (from left to right and top to bottom). Figure 16. An example of the shots for the screen registration. The images shown here were taken from the left camera. TABLE II SCREEN REGISTRATION RESULTS: THE WORLD TO SCREEN TRANSFORMATION M ws Rotation angle (deg.) Translation vector (mm) (R α, R β, R γ ) (T x, T y, T z ) Mirror position #1 (25.46, 9.58, 1.74) (65.42, , ) #2 (24.96, 8.59, 1.71) (67.72, , ) #3 (25.40, 8.68, 1.80) (64.73, , ) #4 (24.44, 8.59, 1.64) (69.45, , ) #5 (24.90, 10.43, 1.45) (70.04, , ) Mean Standard deviation Rotation angle (degree) Translation vector (mm) method is accurate with up to about 2 mm translation error and less than 1 rotation error. The mean value of the transformation will be used to determine the screen location in the intersection calculation module. 7.2 Head Pose Tracking Figure 17 shows an example of the head pose tracking by using K = 100 particles. The tracking was performed by choosing α = 0.7 and β = 0.5 for horizontal and vertical speed components respectively, with noise standard deviation of σ n = 1.8 pixel (see equation (11)). The choice of these parameters depends strongly on the expected speed of the head movements. If only slow movements are present, then Figure 18. The reference face shape model (above) and the estimated face shape in the face coordinate system for all frames (below). we can choose a smaller value for α, β and σ n, thereby improving the tracking precision (smaller jitter). Using larger values will decrease the precision, but this makes the tracking more robust to faster movements. When we compare the estimated face shape of all frames in the same sequence, apparently the shape varies slightly over time (figure 18). This is caused by the stochastic nature of particle filtering. The statistics are shown in table III. This variation would render the user s eyeball model useless, because it assumes that the eye corners are fixed with respect to the whole face shape model. This was the reason that we have fitted the reference shape to the estimated shape (section 5.4). In this way the rigidness of the face shape in each frame was preserved. 12

13 TABLE III STATISTICS OF THE ESTIMATED FACE SHAPE Standard deviations (mm) σ x σ y σ z Facial features mouth corner mouth corner left eye corner left eye corner right eye corner right eye corner TABLE IV ESTIMATED EYEBALL PARAMETERS Eyeball param. (mm) Left eye Right eye Radius r Offset vector d x d y d z Error (mm) Ground truth Measurement Difference x y x y x y x y Average error: Standard deviation: Gaze Direction Estimation Before we could estimate the gaze direction, we trained the system in order to estimate the user-dependent model parameters (see section 6.1). The training sequence was acquired by recording the user s eye corners and cornea positions while he was looking at 4 calibration points on the corners of the screen. After that we estimated the eyeball parameters by solving equation (18). The results are summarized in table IV. We analyzed the effects of errors on two quantities to the overall gaze error: the cornea center and the eyeball center. Together they will determine the gaze vector (section 6.2). A new sequence was acquired while the user was looking at one point while his head was fixed. Since the head and cornea were fixed, the variations on the tracked eye corners (or the eyeball center, indirectly) and the cornea center locations were only caused by the algorithm. The standard deviations of the eyeball center and cornea center are shown in table V. We can see here that the cornea fitting produced almost twice as large deviation as the eyeball center. For the following calculation, the mean of the cornea center for all frames, where the head and eyes are really steady, was considered the true location. The same consideration was also made for the mean of the eyeball centers for all frames. TABLE V TRACKING AND CORNEA FITTING ERROR Standard deviations (mm) Left eye Right eye Eyeball center σ x σ y σ z Cornea center σ x σ y σ z TABLE VI EFFECTS OF INDIVIDUAL PARAMETER ERROR TO THE GAZE PROJECTION ERROR Gaze projection Exp. #1 Exp. #2 Exp. #3 std. dev. (mm) x y x y x y None fixed Eyeball center fixed Cornea center fixed The gaze projections on the screen were calculated in three passes. First the gaze vector in each frame was calculated as usual by equation (20). In the second pass, we held the cornea center constant for all frames by taking the mean. In the last pass, the eyeball center location was held constant for all frames, again by taking the mean. This experiment was repeated 3 times on the same sequence to ensure that the results are more reliable since particle filtering is stochastic in nature (table VI). The results indicate that errors in the cornea center fitting have the largest influence on the vertical gaze projection errors. If the noise in the cornea center is removed (by taking the average), the spread of the gaze projection error becomes smaller and rounder (see also figure 19). Y Figure 19. The plot of the gaze projections of the first experiment of table VI. The symbols represent the mean gaze projection (+), the gaze points when no parameters are fixed ( ), fixed eyeball center ( ), and fixed cornea center ( ). X 13

14 7.4 Overall Performance In this section we presented the overall results of the gaze tracking system when applied to the train sequence and a test sequence. Both sequences were recorded when a person was looking at the same 4 calibration points (figure 20). As we can see, the average error of the gaze projection on the screen is about 6 cm, which corresponds with an angular error of about 7 at a distance of 50 cm. Figure 21 shows the 3D gaze vectors from the left and right eyes when they were projected back to the image plane. When we compare the gaze direction estimation to all 4 calibration points, we see that the projected gaze points on the lower part of the screen have a much smaller and rounder spread. This is caused by the cornea fitting error. Since the cameras were located below the screen, we had an almost frontal view of the face when the user was looking at the lower part of the screen. Hence, the fitting produced smaller error because the cornea has a circular form. The further the user s gaze was away from the camera, the more elliptic the cornea image projection would be, making it more difficult to be fitted. As a result similar spread as that from section 7.3 was observed for the gaze to the upper part of the screen. 8 CONCLUSIONS AND RECOMMENDA- TIONS In this paper a gaze tracking system is presented based on particle filtering and stereo vision. We propose to do facial feature tracking in 2D by particle filtering, and use the stereo information to estimate the head pose of a user in 3D. Together with a 3D eyeball model the 3D gaze direction can be estimated. For gaze tracking application on visual perception research, we need to know the projection of the user s gaze on the screen, where the visual stimuli are presented. We devised a screen registration scheme in order to accurately locate the screen location with respect to the cameras. With this information, the gaze projection on the screen can be calculated. The results achieved by our gaze tracking scheme are promising. The average gaze projection error is about 7, only a few degrees off the specified requirements. At a user-monitor distance of 50 cm and a 17 screen (about 30x24 cm), this would mean that we can distinguish the gaze projections on the screen in about 5 3 distinct blocks. Y (mm) Y (mm) Average gaze projection error: mm (std: mm) X (mm) Average gaze projection error: mm (std: mm) There is still room for improvements in our gaze tracking system. Based on the results in previous section the cornea fitting should be the main concern, since errors in this part have the greatest influence on the overall gaze error. Higher resolution of the cornea fitting is needed, for example by taking larger image resolution, or by fitting the cornea with sub-pixel accuracy. Together with a more sophisticated fitting algorithm such as ellipse fitting [7], better results should be achieved. Another possible source of error is the exclusion of the anatomical and visual axis difference of the eye in the 3D eye model. Compensating this difference will also reduce the overall gaze tracking error. The head pose tracking module still has some difficulty to track persons with glasses and fast head movements. The use of lightings and rotation invariant templates might help to reduce tracking loss. Furthermore, some smoothing in the temporal domain could help to reduce the jitter in the estimated facial feature locations. Finally, to eliminate the manual selection of the facial features in the initialization phase, the possibility to automatically locate these features should be explored X (mm) Figure 20. The plots of projected gaze points from the train sequence (above) and the test sequence (below). The gaze to each calibration points are represented by different symbols. REFERENCES [1] Applied Science Laboratories (ASL), USA. (Last visited: 22 October 2004) [2] Baluja, S. and Pomerleau, D., Non-intrusive Gaze Tracking using Artificial Neural Networks, Report no. CMU-CS , Carnegie Mellon University,

15 Figure 21. Results from detection of gaze direction. The vectors are drawn starting from the cornea center of the left and right eye, respectively. [3] Bouguet, J.Y., Camera Calibration Toolbox for MATLAB. doc/ (Last visited: 28 September 2004) [4] Duchowski, A.T., Eye Tracking Methodology: Theory and Practice, London: Springer, [5] Ebisawa, Y., Improved Video-based Eye-gaze Detection Method, IEEE Transactions on Instrumentation and Measurement, (47)4: , [6] Eye Response Technologies (ERT), USA. (Last visited: 22 October 2004) [7] Fitzgibbon, A.W., Pilu, M. and Fischer, R.B., Direct Least-squares Fitting of Ellipses, IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(5): , Conference on Automatic Face and Gesture Recognition, pp , [17] Reingold, E.M., McConkie, G.W., and Stampe, D.M., Gaze-contingent Multiresolutional Displays: An Integrative Review, Human Factors, (45)2: , [18] Tobii Technology AB, Sweden. (Last visited: 22 October 2004) [19] Wooding, D., Eye Movement Equipment Database (EMED), UK, (Last visited: 22 October 2004) [20] Zhang, Z., A Flexible New Technique for Camera Calibration, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11): , [8] Fourward Technologies, Inc., USA. (Last visited: 22 October 2004) [9] Glassner, A.S., Graphics Gems, pp , Cambridge: Academic Press, [10] Isard, M. and Blake, A., Condensation Conditional Density Propagation for Visual Tracking, International Journal of Computer Vision, (29)1:5 28, [11] Isard, M. and Blake, A., ICondensation: Unifying Low-level and High-level Tracking in a Stochastic Framework, Proceedings of the 5th European Conference on Computer Vision, vol. 1, pp , [12] Ishikawa, T., Baker, S., Matthews, I. and Kanade, T., Passive Driver Gaze Tracking with Active Appearance Models, Proceedings of the 11th World Congress on Intelligent Transportation Systems, [13] Ji, Q. and Zhu, Z., Eye and Gaze Tracking for Interactive Graphic Display, International Symposium on Smart Graphics, [14] Lichtenauer, J., Reinders, M. and Hendriks, E., Influence of the Observation Likelihood Function on Particle Filtering Performance in Tracking Applications, Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition, pp , [15] Matsumoto, Y. and Zelinsky, A., An Algorithm for Real-time Stereo Vision Implementation, Proceedings of the 4th IEEE International Conference on Automatic Face and Gesture Recognition, pp , [16] Patras, I., and Pantic, M., Particle Filtering with Factorized Likelihoods for Tracking Facial Features, Proceedings of the 6th IEEE International 15

Flexible Calibration of a Portable Structured Light System through Surface Plane

Flexible Calibration of a Portable Structured Light System through Surface Plane Vol. 34, No. 11 ACTA AUTOMATICA SINICA November, 2008 Flexible Calibration of a Portable Structured Light System through Surface Plane GAO Wei 1 WANG Liang 1 HU Zhan-Yi 1 Abstract For a portable structured

More information

Visual Recognition: Image Formation

Visual Recognition: Image Formation Visual Recognition: Image Formation Raquel Urtasun TTI Chicago Jan 5, 2012 Raquel Urtasun (TTI-C) Visual Recognition Jan 5, 2012 1 / 61 Today s lecture... Fundamentals of image formation You should know

More information

MERGING POINT CLOUDS FROM MULTIPLE KINECTS. Nishant Rai 13th July, 2016 CARIS Lab University of British Columbia

MERGING POINT CLOUDS FROM MULTIPLE KINECTS. Nishant Rai 13th July, 2016 CARIS Lab University of British Columbia MERGING POINT CLOUDS FROM MULTIPLE KINECTS Nishant Rai 13th July, 2016 CARIS Lab University of British Columbia Introduction What do we want to do? : Use information (point clouds) from multiple (2+) Kinects

More information

Computer Vision. Coordinates. Prof. Flávio Cardeal DECOM / CEFET- MG.

Computer Vision. Coordinates. Prof. Flávio Cardeal DECOM / CEFET- MG. Computer Vision Coordinates Prof. Flávio Cardeal DECOM / CEFET- MG cardeal@decom.cefetmg.br Abstract This lecture discusses world coordinates and homogeneous coordinates, as well as provides an overview

More information

Midterm Exam Solutions

Midterm Exam Solutions Midterm Exam Solutions Computer Vision (J. Košecká) October 27, 2009 HONOR SYSTEM: This examination is strictly individual. You are not allowed to talk, discuss, exchange solutions, etc., with other fellow

More information

Agenda. Rotations. Camera models. Camera calibration. Homographies

Agenda. Rotations. Camera models. Camera calibration. Homographies Agenda Rotations Camera models Camera calibration Homographies D Rotations R Y = Z r r r r r r r r r Y Z Think of as change of basis where ri = r(i,:) are orthonormal basis vectors r rotated coordinate

More information

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration Camera Calibration Jesus J Caban Schedule! Today:! Camera calibration! Wednesday:! Lecture: Motion & Optical Flow! Monday:! Lecture: Medical Imaging! Final presentations:! Nov 29 th : W. Griffin! Dec 1

More information

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection CHAPTER 3 Single-view Geometry When we open an eye or take a photograph, we see only a flattened, two-dimensional projection of the physical underlying scene. The consequences are numerous and startling.

More information

METRIC PLANE RECTIFICATION USING SYMMETRIC VANISHING POINTS

METRIC PLANE RECTIFICATION USING SYMMETRIC VANISHING POINTS METRIC PLANE RECTIFICATION USING SYMMETRIC VANISHING POINTS M. Lefler, H. Hel-Or Dept. of CS, University of Haifa, Israel Y. Hel-Or School of CS, IDC, Herzliya, Israel ABSTRACT Video analysis often requires

More information

calibrated coordinates Linear transformation pixel coordinates

calibrated coordinates Linear transformation pixel coordinates 1 calibrated coordinates Linear transformation pixel coordinates 2 Calibration with a rig Uncalibrated epipolar geometry Ambiguities in image formation Stratified reconstruction Autocalibration with partial

More information

Computer Vision Projective Geometry and Calibration. Pinhole cameras

Computer Vision Projective Geometry and Calibration. Pinhole cameras Computer Vision Projective Geometry and Calibration Professor Hager http://www.cs.jhu.edu/~hager Jason Corso http://www.cs.jhu.edu/~jcorso. Pinhole cameras Abstract camera model - box with a small hole

More information

Robot Vision: Camera calibration

Robot Vision: Camera calibration Robot Vision: Camera calibration Ass.Prof. Friedrich Fraundorfer SS 201 1 Outline Camera calibration Cameras with lenses Properties of real lenses (distortions, focal length, field-of-view) Calibration

More information

Chapter 3 Image Registration. Chapter 3 Image Registration

Chapter 3 Image Registration. Chapter 3 Image Registration Chapter 3 Image Registration Distributed Algorithms for Introduction (1) Definition: Image Registration Input: 2 images of the same scene but taken from different perspectives Goal: Identify transformation

More information

DD2423 Image Analysis and Computer Vision IMAGE FORMATION. Computational Vision and Active Perception School of Computer Science and Communication

DD2423 Image Analysis and Computer Vision IMAGE FORMATION. Computational Vision and Active Perception School of Computer Science and Communication DD2423 Image Analysis and Computer Vision IMAGE FORMATION Mårten Björkman Computational Vision and Active Perception School of Computer Science and Communication November 8, 2013 1 Image formation Goal:

More information

Outline. ETN-FPI Training School on Plenoptic Sensing

Outline. ETN-FPI Training School on Plenoptic Sensing Outline Introduction Part I: Basics of Mathematical Optimization Linear Least Squares Nonlinear Optimization Part II: Basics of Computer Vision Camera Model Multi-Camera Model Multi-Camera Calibration

More information

Agenda. Rotations. Camera calibration. Homography. Ransac

Agenda. Rotations. Camera calibration. Homography. Ransac Agenda Rotations Camera calibration Homography Ransac Geometric Transformations y x Transformation Matrix # DoF Preserves Icon translation rigid (Euclidean) similarity affine projective h I t h R t h sr

More information

CALIBRATION BETWEEN DEPTH AND COLOR SENSORS FOR COMMODITY DEPTH CAMERAS. Cha Zhang and Zhengyou Zhang

CALIBRATION BETWEEN DEPTH AND COLOR SENSORS FOR COMMODITY DEPTH CAMERAS. Cha Zhang and Zhengyou Zhang CALIBRATION BETWEEN DEPTH AND COLOR SENSORS FOR COMMODITY DEPTH CAMERAS Cha Zhang and Zhengyou Zhang Communication and Collaboration Systems Group, Microsoft Research {chazhang, zhang}@microsoft.com ABSTRACT

More information

Camera model and multiple view geometry

Camera model and multiple view geometry Chapter Camera model and multiple view geometry Before discussing how D information can be obtained from images it is important to know how images are formed First the camera model is introduced and then

More information

Camera Models and Image Formation. Srikumar Ramalingam School of Computing University of Utah

Camera Models and Image Formation. Srikumar Ramalingam School of Computing University of Utah Camera Models and Image Formation Srikumar Ramalingam School of Computing University of Utah srikumar@cs.utah.edu Reference Most slides are adapted from the following notes: Some lecture notes on geometric

More information

How to Compute the Pose of an Object without a Direct View?

How to Compute the Pose of an Object without a Direct View? How to Compute the Pose of an Object without a Direct View? Peter Sturm and Thomas Bonfort INRIA Rhône-Alpes, 38330 Montbonnot St Martin, France {Peter.Sturm, Thomas.Bonfort}@inrialpes.fr Abstract. We

More information

Camera Models and Image Formation. Srikumar Ramalingam School of Computing University of Utah

Camera Models and Image Formation. Srikumar Ramalingam School of Computing University of Utah Camera Models and Image Formation Srikumar Ramalingam School of Computing University of Utah srikumar@cs.utah.edu VisualFunHouse.com 3D Street Art Image courtesy: Julian Beaver (VisualFunHouse.com) 3D

More information

Hartley - Zisserman reading club. Part I: Hartley and Zisserman Appendix 6: Part II: Zhengyou Zhang: Presented by Daniel Fontijne

Hartley - Zisserman reading club. Part I: Hartley and Zisserman Appendix 6: Part II: Zhengyou Zhang: Presented by Daniel Fontijne Hartley - Zisserman reading club Part I: Hartley and Zisserman Appendix 6: Iterative estimation methods Part II: Zhengyou Zhang: A Flexible New Technique for Camera Calibration Presented by Daniel Fontijne

More information

Laser sensors. Transmitter. Receiver. Basilio Bona ROBOTICA 03CFIOR

Laser sensors. Transmitter. Receiver. Basilio Bona ROBOTICA 03CFIOR Mobile & Service Robotics Sensors for Robotics 3 Laser sensors Rays are transmitted and received coaxially The target is illuminated by collimated rays The receiver measures the time of flight (back and

More information

Lecture 3: Camera Calibration, DLT, SVD

Lecture 3: Camera Calibration, DLT, SVD Computer Vision Lecture 3 23--28 Lecture 3: Camera Calibration, DL, SVD he Inner Parameters In this section we will introduce the inner parameters of the cameras Recall from the camera equations λx = P

More information

Geometric camera models and calibration

Geometric camera models and calibration Geometric camera models and calibration http://graphics.cs.cmu.edu/courses/15-463 15-463, 15-663, 15-862 Computational Photography Fall 2018, Lecture 13 Course announcements Homework 3 is out. - Due October

More information

ECE 470: Homework 5. Due Tuesday, October 27 in Seth Hutchinson. Luke A. Wendt

ECE 470: Homework 5. Due Tuesday, October 27 in Seth Hutchinson. Luke A. Wendt ECE 47: Homework 5 Due Tuesday, October 7 in class @:3pm Seth Hutchinson Luke A Wendt ECE 47 : Homework 5 Consider a camera with focal length λ = Suppose the optical axis of the camera is aligned with

More information

CIS 580, Machine Perception, Spring 2015 Homework 1 Due: :59AM

CIS 580, Machine Perception, Spring 2015 Homework 1 Due: :59AM CIS 580, Machine Perception, Spring 2015 Homework 1 Due: 2015.02.09. 11:59AM Instructions. Submit your answers in PDF form to Canvas. This is an individual assignment. 1 Camera Model, Focal Length and

More information

Machine vision. Summary # 11: Stereo vision and epipolar geometry. u l = λx. v l = λy

Machine vision. Summary # 11: Stereo vision and epipolar geometry. u l = λx. v l = λy 1 Machine vision Summary # 11: Stereo vision and epipolar geometry STEREO VISION The goal of stereo vision is to use two cameras to capture 3D scenes. There are two important problems in stereo vision:

More information

CS201 Computer Vision Camera Geometry

CS201 Computer Vision Camera Geometry CS201 Computer Vision Camera Geometry John Magee 25 November, 2014 Slides Courtesy of: Diane H. Theriault (deht@bu.edu) Question of the Day: How can we represent the relationships between cameras and the

More information

EXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006,

EXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006, School of Computer Science and Communication, KTH Danica Kragic EXAM SOLUTIONS Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006, 14.00 19.00 Grade table 0-25 U 26-35 3 36-45

More information

Motion Tracking and Event Understanding in Video Sequences

Motion Tracking and Event Understanding in Video Sequences Motion Tracking and Event Understanding in Video Sequences Isaac Cohen Elaine Kang, Jinman Kang Institute for Robotics and Intelligent Systems University of Southern California Los Angeles, CA Objectives!

More information

Image Formation. Antonino Furnari. Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania

Image Formation. Antonino Furnari. Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania Image Formation Antonino Furnari Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania furnari@dmi.unict.it 18/03/2014 Outline Introduction; Geometric Primitives

More information

Rigid Body Motion and Image Formation. Jana Kosecka, CS 482

Rigid Body Motion and Image Formation. Jana Kosecka, CS 482 Rigid Body Motion and Image Formation Jana Kosecka, CS 482 A free vector is defined by a pair of points : Coordinates of the vector : 1 3D Rotation of Points Euler angles Rotation Matrices in 3D 3 by 3

More information

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision What Happened Last Time? Human 3D perception (3D cinema) Computational stereo Intuitive explanation of what is meant by disparity Stereo matching

More information

ELEC Dr Reji Mathew Electrical Engineering UNSW

ELEC Dr Reji Mathew Electrical Engineering UNSW ELEC 4622 Dr Reji Mathew Electrical Engineering UNSW Review of Motion Modelling and Estimation Introduction to Motion Modelling & Estimation Forward Motion Backward Motion Block Motion Estimation Motion

More information

/$ IEEE

/$ IEEE 2246 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 54, NO. 12, DECEMBER 2007 Novel Eye Gaze Tracking Techniques Under Natural Head Movement Zhiwei Zhu and Qiang Ji*, Senior Member, IEEE Abstract Most

More information

COSC579: Scene Geometry. Jeremy Bolton, PhD Assistant Teaching Professor

COSC579: Scene Geometry. Jeremy Bolton, PhD Assistant Teaching Professor COSC579: Scene Geometry Jeremy Bolton, PhD Assistant Teaching Professor Overview Linear Algebra Review Homogeneous vs non-homogeneous representations Projections and Transformations Scene Geometry The

More information

Hand-Eye Calibration from Image Derivatives

Hand-Eye Calibration from Image Derivatives Hand-Eye Calibration from Image Derivatives Abstract In this paper it is shown how to perform hand-eye calibration using only the normal flow field and knowledge about the motion of the hand. The proposed

More information

3D Sensing and Reconstruction Readings: Ch 12: , Ch 13: ,

3D Sensing and Reconstruction Readings: Ch 12: , Ch 13: , 3D Sensing and Reconstruction Readings: Ch 12: 12.5-6, Ch 13: 13.1-3, 13.9.4 Perspective Geometry Camera Model Stereo Triangulation 3D Reconstruction by Space Carving 3D Shape from X means getting 3D coordinates

More information

Pin Hole Cameras & Warp Functions

Pin Hole Cameras & Warp Functions Pin Hole Cameras & Warp Functions Instructor - Simon Lucey 16-423 - Designing Computer Vision Apps Today Pinhole Camera. Homogenous Coordinates. Planar Warp Functions. Motivation Taken from: http://img.gawkerassets.com/img/18w7i1umpzoa9jpg/original.jpg

More information

Calibration of a Different Field-of-view Stereo Camera System using an Embedded Checkerboard Pattern

Calibration of a Different Field-of-view Stereo Camera System using an Embedded Checkerboard Pattern Calibration of a Different Field-of-view Stereo Camera System using an Embedded Checkerboard Pattern Pathum Rathnayaka, Seung-Hae Baek and Soon-Yong Park School of Computer Science and Engineering, Kyungpook

More information

Camera Calibration. COS 429 Princeton University

Camera Calibration. COS 429 Princeton University Camera Calibration COS 429 Princeton University Point Correspondences What can you figure out from point correspondences? Noah Snavely Point Correspondences X 1 X 4 X 3 X 2 X 5 X 6 X 7 p 1,1 p 1,2 p 1,3

More information

1 Projective Geometry

1 Projective Geometry CIS8, Machine Perception Review Problem - SPRING 26 Instructions. All coordinate systems are right handed. Projective Geometry Figure : Facade rectification. I took an image of a rectangular object, and

More information

Practice Exam Sample Solutions

Practice Exam Sample Solutions CS 675 Computer Vision Instructor: Marc Pomplun Practice Exam Sample Solutions Note that in the actual exam, no calculators, no books, and no notes allowed. Question 1: out of points Question 2: out of

More information

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 253

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 253 Index 3D reconstruction, 123 5+1-point algorithm, 274 5-point algorithm, 260 7-point algorithm, 255 8-point algorithm, 253 affine point, 43 affine transformation, 55 affine transformation group, 55 affine

More information

Topics and things to know about them:

Topics and things to know about them: Practice Final CMSC 427 Distributed Tuesday, December 11, 2007 Review Session, Monday, December 17, 5:00pm, 4424 AV Williams Final: 10:30 AM Wednesday, December 19, 2007 General Guidelines: The final will

More information

Novel Eye Gaze Tracking Techniques Under Natural Head Movement

Novel Eye Gaze Tracking Techniques Under Natural Head Movement TO APPEAR IN IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING 1 Novel Eye Gaze Tracking Techniques Under Natural Head Movement Zhiwei Zhu and Qiang Ji Abstract Most available remote eye gaze trackers have two

More information

TD2 : Stereoscopy and Tracking: solutions

TD2 : Stereoscopy and Tracking: solutions TD2 : Stereoscopy and Tracking: solutions Preliminary: λ = P 0 with and λ > 0. If camera undergoes the rigid transform: (R,T), then with, so that is the intrinsic parameter matrix. C(Cx,Cy,Cz) is the point

More information

Image Transformations & Camera Calibration. Mašinska vizija, 2018.

Image Transformations & Camera Calibration. Mašinska vizija, 2018. Image Transformations & Camera Calibration Mašinska vizija, 2018. Image transformations What ve we learnt so far? Example 1 resize and rotate Open warp_affine_template.cpp Perform simple resize

More information

EECS 442: Final Project

EECS 442: Final Project EECS 442: Final Project Structure From Motion Kevin Choi Robotics Ismail El Houcheimi Robotics Yih-Jye Jeffrey Hsu Robotics Abstract In this paper, we summarize the method, and results of our projective

More information

Chapter 4. Clustering Core Atoms by Location

Chapter 4. Clustering Core Atoms by Location Chapter 4. Clustering Core Atoms by Location In this chapter, a process for sampling core atoms in space is developed, so that the analytic techniques in section 3C can be applied to local collections

More information

Calibrating an Overhead Video Camera

Calibrating an Overhead Video Camera Calibrating an Overhead Video Camera Raul Rojas Freie Universität Berlin, Takustraße 9, 495 Berlin, Germany http://www.fu-fighters.de Abstract. In this section we discuss how to calibrate an overhead video

More information

Camera models and calibration

Camera models and calibration Camera models and calibration Read tutorial chapter 2 and 3. http://www.cs.unc.edu/~marc/tutorial/ Szeliski s book pp.29-73 Schedule (tentative) 2 # date topic Sep.8 Introduction and geometry 2 Sep.25

More information

CSCI 5980: Assignment #3 Homography

CSCI 5980: Assignment #3 Homography Submission Assignment due: Feb 23 Individual assignment. Write-up submission format: a single PDF up to 3 pages (more than 3 page assignment will be automatically returned.). Code and data. Submission

More information

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 263

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 263 Index 3D reconstruction, 125 5+1-point algorithm, 284 5-point algorithm, 270 7-point algorithm, 265 8-point algorithm, 263 affine point, 45 affine transformation, 57 affine transformation group, 57 affine

More information

Eye tracking by image processing for helping disabled people. Alireza Rahimpour

Eye tracking by image processing for helping disabled people. Alireza Rahimpour An Introduction to: Eye tracking by image processing for helping disabled people Alireza Rahimpour arahimpo@utk.edu Fall 2012 1 Eye tracking system: Nowadays eye gaze tracking has wide range of applications

More information

Stereo CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Stereo CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz Stereo CSE 576 Ali Farhadi Several slides from Larry Zitnick and Steve Seitz Why do we perceive depth? What do humans use as depth cues? Motion Convergence When watching an object close to us, our eyes

More information

Assignment 2 : Projection and Homography

Assignment 2 : Projection and Homography TECHNISCHE UNIVERSITÄT DRESDEN EINFÜHRUNGSPRAKTIKUM COMPUTER VISION Assignment 2 : Projection and Homography Hassan Abu Alhaija November 7,204 INTRODUCTION In this exercise session we will get a hands-on

More information

Perspective Projection Describes Image Formation Berthold K.P. Horn

Perspective Projection Describes Image Formation Berthold K.P. Horn Perspective Projection Describes Image Formation Berthold K.P. Horn Wheel Alignment: Camber, Caster, Toe-In, SAI, Camber: angle between axle and horizontal plane. Toe: angle between projection of axle

More information

Module 4F12: Computer Vision and Robotics Solutions to Examples Paper 2

Module 4F12: Computer Vision and Robotics Solutions to Examples Paper 2 Engineering Tripos Part IIB FOURTH YEAR Module 4F2: Computer Vision and Robotics Solutions to Examples Paper 2. Perspective projection and vanishing points (a) Consider a line in 3D space, defined in camera-centered

More information

CSE 252B: Computer Vision II

CSE 252B: Computer Vision II CSE 252B: Computer Vision II Lecturer: Serge Belongie Scribe: Sameer Agarwal LECTURE 1 Image Formation 1.1. The geometry of image formation We begin by considering the process of image formation when a

More information

Vision Review: Image Formation. Course web page:

Vision Review: Image Formation. Course web page: Vision Review: Image Formation Course web page: www.cis.udel.edu/~cer/arv September 10, 2002 Announcements Lecture on Thursday will be about Matlab; next Tuesday will be Image Processing The dates some

More information

(a) (b) (c) Fig. 1. Omnidirectional camera: (a) principle; (b) physical construction; (c) captured. of a local vision system is more challenging than

(a) (b) (c) Fig. 1. Omnidirectional camera: (a) principle; (b) physical construction; (c) captured. of a local vision system is more challenging than An Omnidirectional Vision System that finds and tracks color edges and blobs Felix v. Hundelshausen, Sven Behnke, and Raul Rojas Freie Universität Berlin, Institut für Informatik Takustr. 9, 14195 Berlin,

More information

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit Augmented Reality VU Computer Vision 3D Registration (2) Prof. Vincent Lepetit Feature Point-Based 3D Tracking Feature Points for 3D Tracking Much less ambiguous than edges; Point-to-point reprojection

More information

CSE 252B: Computer Vision II

CSE 252B: Computer Vision II CSE 252B: Computer Vision II Lecturer: Serge Belongie Scribe : Martin Stiaszny and Dana Qu LECTURE 0 Camera Calibration 0.. Introduction Just like the mythical frictionless plane, in real life we will

More information

SEVER INSTITUTE OF TECHNOLOGY MASTER OF SCIENCE DEGREE THESIS ACCEPTANCE. (To be the first page of each copy of the thesis)

SEVER INSTITUTE OF TECHNOLOGY MASTER OF SCIENCE DEGREE THESIS ACCEPTANCE. (To be the first page of each copy of the thesis) SEVER INSTITUTE OF TECHNOLOGY MASTER OF SCIENCE DEGREE THESIS ACCEPTANCE (To be the first page of each copy of the thesis) DATE: January 17, 2003 STUDENT S NAME: Mark A. Schroering This student s thesis,

More information

Camera Geometry II. COS 429 Princeton University

Camera Geometry II. COS 429 Princeton University Camera Geometry II COS 429 Princeton University Outline Projective geometry Vanishing points Application: camera calibration Application: single-view metrology Epipolar geometry Application: stereo correspondence

More information

Lecture 17: Recursive Ray Tracing. Where is the way where light dwelleth? Job 38:19

Lecture 17: Recursive Ray Tracing. Where is the way where light dwelleth? Job 38:19 Lecture 17: Recursive Ray Tracing Where is the way where light dwelleth? Job 38:19 1. Raster Graphics Typical graphics terminals today are raster displays. A raster display renders a picture scan line

More information

Visual Computing Midterm Winter Pledge: I neither received nor gave any help from or to anyone in this exam.

Visual Computing Midterm Winter Pledge: I neither received nor gave any help from or to anyone in this exam. Visual Computing Midterm Winter 2018 Total Points: 80 points Name: Number: Pledge: I neither received nor gave any help from or to anyone in this exam. Signature: Useful Tips 1. All questions are multiple

More information

Two-view geometry Computer Vision Spring 2018, Lecture 10

Two-view geometry Computer Vision Spring 2018, Lecture 10 Two-view geometry http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 2018, Lecture 10 Course announcements Homework 2 is due on February 23 rd. - Any questions about the homework? - How many of

More information

Comparison between Motion Analysis and Stereo

Comparison between Motion Analysis and Stereo MOTION ESTIMATION The slides are from several sources through James Hays (Brown); Silvio Savarese (U. of Michigan); Octavia Camps (Northeastern); including their own slides. Comparison between Motion Analysis

More information

Sensor Modalities. Sensor modality: Different modalities:

Sensor Modalities. Sensor modality: Different modalities: Sensor Modalities Sensor modality: Sensors which measure same form of energy and process it in similar ways Modality refers to the raw input used by the sensors Different modalities: Sound Pressure Temperature

More information

Free head motion eye gaze tracking using a single camera and multiple light sources

Free head motion eye gaze tracking using a single camera and multiple light sources Free head motion eye gaze tracking using a single camera and multiple light sources Flávio Luiz Coutinho and Carlos Hitoshi Morimoto Departamento de Ciência da Computação Instituto de Matemática e Estatística

More information

Structure from motion

Structure from motion Structure from motion Structure from motion Given a set of corresponding points in two or more images, compute the camera parameters and the 3D point coordinates?? R 1,t 1 R 2,t R 2 3,t 3 Camera 1 Camera

More information

Improving Vision-Based Distance Measurements using Reference Objects

Improving Vision-Based Distance Measurements using Reference Objects Improving Vision-Based Distance Measurements using Reference Objects Matthias Jüngel, Heinrich Mellmann, and Michael Spranger Humboldt-Universität zu Berlin, Künstliche Intelligenz Unter den Linden 6,

More information

An Overview of Matchmoving using Structure from Motion Methods

An Overview of Matchmoving using Structure from Motion Methods An Overview of Matchmoving using Structure from Motion Methods Kamyar Haji Allahverdi Pour Department of Computer Engineering Sharif University of Technology Tehran, Iran Email: allahverdi@ce.sharif.edu

More information

CS223b Midterm Exam, Computer Vision. Monday February 25th, Winter 2008, Prof. Jana Kosecka

CS223b Midterm Exam, Computer Vision. Monday February 25th, Winter 2008, Prof. Jana Kosecka CS223b Midterm Exam, Computer Vision Monday February 25th, Winter 2008, Prof. Jana Kosecka Your name email This exam is 8 pages long including cover page. Make sure your exam is not missing any pages.

More information

Homework #1. Displays, Alpha Compositing, Image Processing, Affine Transformations, Hierarchical Modeling

Homework #1. Displays, Alpha Compositing, Image Processing, Affine Transformations, Hierarchical Modeling Computer Graphics Instructor: Brian Curless CSE 457 Spring 2014 Homework #1 Displays, Alpha Compositing, Image Processing, Affine Transformations, Hierarchical Modeling Assigned: Saturday, April th Due:

More information

Structure from Motion. Prof. Marco Marcon

Structure from Motion. Prof. Marco Marcon Structure from Motion Prof. Marco Marcon Summing-up 2 Stereo is the most powerful clue for determining the structure of a scene Another important clue is the relative motion between the scene and (mono)

More information

CS231A Midterm Review. Friday 5/6/2016

CS231A Midterm Review. Friday 5/6/2016 CS231A Midterm Review Friday 5/6/2016 Outline General Logistics Camera Models Non-perspective cameras Calibration Single View Metrology Epipolar Geometry Structure from Motion Active Stereo and Volumetric

More information

Pin Hole Cameras & Warp Functions

Pin Hole Cameras & Warp Functions Pin Hole Cameras & Warp Functions Instructor - Simon Lucey 16-423 - Designing Computer Vision Apps Today Pinhole Camera. Homogenous Coordinates. Planar Warp Functions. Example of SLAM for AR Taken from:

More information

Computer Vision I - Algorithms and Applications: Multi-View 3D reconstruction

Computer Vision I - Algorithms and Applications: Multi-View 3D reconstruction Computer Vision I - Algorithms and Applications: Multi-View 3D reconstruction Carsten Rother 09/12/2013 Computer Vision I: Multi-View 3D reconstruction Roadmap this lecture Computer Vision I: Multi-View

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 WRI C225 Lecture 02 130124 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Basics Image Formation Image Processing 3 Intelligent

More information

Facial Processing Projects at the Intelligent Systems Lab

Facial Processing Projects at the Intelligent Systems Lab Facial Processing Projects at the Intelligent Systems Lab Qiang Ji Intelligent Systems Laboratory (ISL) Department of Electrical, Computer, and System Eng. Rensselaer Polytechnic Institute jiq@rpi.edu

More information

3-D D Euclidean Space - Vectors

3-D D Euclidean Space - Vectors 3-D D Euclidean Space - Vectors Rigid Body Motion and Image Formation A free vector is defined by a pair of points : Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Coordinates of the vector : 3D Rotation

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 14 130307 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Stereo Dense Motion Estimation Translational

More information

This chapter explains two techniques which are frequently used throughout

This chapter explains two techniques which are frequently used throughout Chapter 2 Basic Techniques This chapter explains two techniques which are frequently used throughout this thesis. First, we will introduce the concept of particle filters. A particle filter is a recursive

More information

Rectification and Distortion Correction

Rectification and Distortion Correction Rectification and Distortion Correction Hagen Spies March 12, 2003 Computer Vision Laboratory Department of Electrical Engineering Linköping University, Sweden Contents Distortion Correction Rectification

More information

Computer Vision I Name : CSE 252A, Fall 2012 Student ID : David Kriegman Assignment #1. (Due date: 10/23/2012) x P. = z

Computer Vision I Name : CSE 252A, Fall 2012 Student ID : David Kriegman   Assignment #1. (Due date: 10/23/2012) x P. = z Computer Vision I Name : CSE 252A, Fall 202 Student ID : David Kriegman E-Mail : Assignment (Due date: 0/23/202). Perspective Projection [2pts] Consider a perspective projection where a point = z y x P

More information

MAPI Computer Vision. Multiple View Geometry

MAPI Computer Vision. Multiple View Geometry MAPI Computer Vision Multiple View Geometry Geometry o Multiple Views 2- and 3- view geometry p p Kpˆ [ K R t]p Geometry o Multiple Views 2- and 3- view geometry Epipolar Geometry The epipolar geometry

More information

Computer Vision Lecture 17

Computer Vision Lecture 17 Computer Vision Lecture 17 Epipolar Geometry & Stereo Basics 13.01.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar in the summer semester

More information

3D Geometry and Camera Calibration

3D Geometry and Camera Calibration 3D Geometry and Camera Calibration 3D Coordinate Systems Right-handed vs. left-handed x x y z z y 2D Coordinate Systems 3D Geometry Basics y axis up vs. y axis down Origin at center vs. corner Will often

More information

Chapter 7: Computation of the Camera Matrix P

Chapter 7: Computation of the Camera Matrix P Chapter 7: Computation of the Camera Matrix P Arco Nederveen Eagle Vision March 18, 2008 Arco Nederveen (Eagle Vision) The Camera Matrix P March 18, 2008 1 / 25 1 Chapter 7: Computation of the camera Matrix

More information

Three-dimensional nondestructive evaluation of cylindrical objects (pipe) using an infrared camera coupled to a 3D scanner

Three-dimensional nondestructive evaluation of cylindrical objects (pipe) using an infrared camera coupled to a 3D scanner Three-dimensional nondestructive evaluation of cylindrical objects (pipe) using an infrared camera coupled to a 3D scanner F. B. Djupkep Dizeu, S. Hesabi, D. Laurendeau, A. Bendada Computer Vision and

More information

Computer Vision Lecture 17

Computer Vision Lecture 17 Announcements Computer Vision Lecture 17 Epipolar Geometry & Stereo Basics Seminar in the summer semester Current Topics in Computer Vision and Machine Learning Block seminar, presentations in 1 st week

More information

Stereo Vision. MAN-522 Computer Vision

Stereo Vision. MAN-522 Computer Vision Stereo Vision MAN-522 Computer Vision What is the goal of stereo vision? The recovery of the 3D structure of a scene using two or more images of the 3D scene, each acquired from a different viewpoint in

More information

BIL Computer Vision Apr 16, 2014

BIL Computer Vision Apr 16, 2014 BIL 719 - Computer Vision Apr 16, 2014 Binocular Stereo (cont d.), Structure from Motion Aykut Erdem Dept. of Computer Engineering Hacettepe University Slide credit: S. Lazebnik Basic stereo matching algorithm

More information

CIS 580, Machine Perception, Spring 2016 Homework 2 Due: :59AM

CIS 580, Machine Perception, Spring 2016 Homework 2 Due: :59AM CIS 580, Machine Perception, Spring 2016 Homework 2 Due: 2015.02.24. 11:59AM Instructions. Submit your answers in PDF form to Canvas. This is an individual assignment. 1 Recover camera orientation By observing

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 10 130221 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Canny Edge Detector Hough Transform Feature-Based

More information

HW 1: Project Report (Camera Calibration)

HW 1: Project Report (Camera Calibration) HW 1: Project Report (Camera Calibration) ABHISHEK KUMAR (abhik@sci.utah.edu) 1 Problem The problem is to calibrate a camera for a fixed focal length using two orthogonal checkerboard planes, and to find

More information