A Bayesian Framework for Real-Time 3D Hand Tracking in High Clutter Background
|
|
- James Gabriel Patterson
- 5 years ago
- Views:
Transcription
1 A Bayesian Framework for Real-Time 3D Hand Tracking in High Clutter Background Hanning Zhou, Thomas S. Huang University of Illinois at Urbana-Champaign 45 N. Mathews, Urbana, IL 6181, U.S.A Abstract Robust tracking of global hand motion in cluttered background is an important task in humancomputer interaction and automatic interpretation of American Sign Language. It is still an open problem due to variant lighting, cluttered background and occlusion. In this paper, a Bayesian framework is proposed to incorporate the hand shape model, the skin color model and image observations to recover the position and orientation of hand in 3D space from monocular images. The robustness of our approach has been verified with extensive experiments. 1 Introduction Hand gestures can be a more natural way for humans to interact with computers. For instance, one can use his or her hand to manipulate virtual objects directly in a virtual environment. However, capturing human hand motion is inherently difficult due to its articulation and variability. One way to solve the problem is divide and conquer [Wu and Huang, 1999], i.e., decoupling hand motion into the global motion of a rigid model and finger movements and iteratively solve them until a convergence is reached. This approach demands (1) a robust algorithm to recover the 3D position and orientation of the hand, and (2) an efficient algorithm to recover the finger configuration. The first problem has been extensively explored. Black and Jepson [Black and Jepson, 1996] used an appearance-based model in the eigen space to recover the 2D position of a bounding box. The more general case of 3D curve matching has been addressed by Zhang [Zhang, 1994]. Blake and Islard [Blake and Isard, 1998] used the active contour to track global hand motion and recover global hand position. Their model is a deformable 2D planar curve, which cannot handle outplane rotation. [O Hagan et al., 21] developed a realtime HCI system by tracking fingertips from stereo views, which requires clean background and the accuracy was only evaluated using a grid pattern. Among many tracking techniques, particular interest is put on combining multiple cues [Wu and Huang, 21], [Tao et al., 2], [Birchfield, 1998], including color segmentation, edge and motion. In this paper, we propose a Bayesian framework to combine the a priori knowledge in the color and the shape of hand with the observation from image sequences. Within this framework, an algorithm based on ICP (iterative closed point)[zhang, 1994] is used to find the maximum likelihood solution
2 of the position and orientation a rigid planar model of the hand in 3D space, given images captured with a single camera. Section 2 establishes a Bayesian network to describe the generative model. Section 3 introduces a novel feature likelihood edge and the observation. In Section 4, the tracking problem is formulated as inference in the Bayesian network and solved with an ICP-based algorithm. Section 5 provides experimental results in both quantitative and visual forms. Section 6 concludes applicable situation and the limitation of this approach and future directions for extension and improvement. 2 Bayesian Hand Tracking A single-view sequence of color images are given by scaled orthographic projection of a rigid planar model of hand going through homogenous transformation. A priori knowledge λ includes: a rigid planer model for contour of the hand given as {s i = (m i, dm i ), i = 1, 2,... n}, where m i = [u i, v i ] is the 2D coordinates, dm i = [du i, dv i ] is the normal direction (pointing from inner region to outer region) of the contour at s i, and the initial pose (3D position and orientation) of the hand. The tracking problem is formulated as finding the corresponding sequence of rotation-translation matrices M = [R T ] with respect to the initial pose. Figure 2 shows the Bayesian network describing the dependencies in the generative model, where HE and GE denote histgram edge and grayscale edge respectively, as defined in Section 3. $ " # "!!!! Figure 1: The Bayesian network for hand tracking. The observed features LE and GE (edge points in likelihood ratio image and grayscale image) are generated from a distribution p(edge M, λ, LC). Assuming the a priori p(m λ, LC) is uniformly distributed in the subspace of feasible transformations, by Bayesian rule p(m edge, λ, LC) = p(edge M,λ,LC) p(m λ,lc) p(edge λ,lc), we can maximize the a posteriori p(m edge, λ, LC) = p(edge M, λ, LC) p(m λ, LC) by maximizing the likelihood p(edge M, λ, LC), since p(edge λ, LC) is independent of M. As HE and GE are independent given M and LC (i.e. d-separated [Jensen, 1996]), the likelihood function p(edge M, λ, LC) can be decomposed as p(edge M, λ, LC) = p(he M, λ, LC) p(ge M, λ, LC) (1)
3 The observation and inference in the Bayesian network can be implemented with the flow chart shown in Figure 2. Each component in the flow chart will be explained in detail in the following " $% " # " # $% $% # $% ) &" ' " (! Figure 2: The flow chart for Bayesian hand tracking. sections. 3 Observation: Extract Matching Candidates Each frame captured by the camera is an RGB image denoted by I k, which is converted to a grayscale image G k and an HSI (hue, saturation and intensity) image H k. The HSI image is further
4 mapped to a likelihood ratio image L k by the function defined in Equation (2) Since Equation (5) can be evaluated as: L k (u, v) = p(h k(u, v) skin) p(h k (u, v) nonskin) p(h k (u, v) skin) = p(skin H k(u, v))p(h k (u, v)) p(skin) p(h k (u, v) nonskin) = p(nonskin H k(u, v))p(h k (u, v)) p(nonskin) L k (u, v) = p(skin H k(u, v))p(nonskin) p(nonskin H k (u, v))p(skin) (2) (3) (4) (5) Jones and Rehg [Jones and Rehg, 1999] used a standard likelihood ratio approach [K.Fukunaga, 199], but the quantitative information is lost during thresholding the likelihood ratio to decide pixel belongs to skin or nonskin region. To preserves all the sufficient statistics, we choose to use the likelihood ratio without thresholding. The candidate correspondences of the sample points m i are those edge points in the likelihood ratio image, called likelihood edge: LE = {sh j = (le j, dle j ), j = 1, 2,..., n HE }, where le j = [u j, v j ] denotes 2D coordinates and dle j = [du j, dv j ] denotes the gradient. Similarly, in the gray scale image G k, grayscale edge: GE = {sg j = (ge j, dge j ), j = 1, 2,..., n GE } are extracted. 4 Inference: Recover 3D Motion Under moderate outplane rotation and finger articulation, a human hand can be approximated as a planar object. Assuming the centroid of the hand model to be the origin of the world coordinate and the z-axis to be pointing out of the frontal side of the palm, the 3D coordinates for each sample point is p i = [x i [1] x[1], y i [1] ȳ[1], ] T, where ( x[1],ȳ[1]) is the centroid of the model. The transformation from time instant 1 to k can be expressed as p i [k] = Rp i [1] + T, i = 1... N (6) Given image points of a planar object from two perspective camera views, it is a classical stereo vision problem to search for correspondences and solve R T between the two camera coordinates [Tsai and Huang, 1981] [Tsai et al., 1982] [Weng et al., 1992] [Weng et al., 1993] [Hartley and Zisserman, 2] [Faugeras et al., 21]. However, a general solution is usually sensitive to observation noise. In our special case, a canonical planar model is given, based on which an ICP-based algorithm [Zhang, 1994] can be used to we iteratively search correspondences and solve the Homography. 4.1 Matching With The Model We use the homography matrix H in the previous frame (H is identity matrix in the very first frame) to warp the model with Equation (9) and (1). Matching warped model point m i [k] with observed edge g j [k], l j [k] can be formulated as the optimization problem in Equation (7):
5 F GE (i) = arg min j {d(s i[k], g j [k])} F LE (i) = arg min j {d(s i[k], l j [k])} (7) where denotes the search region, and the distance measure is defined as follows: d(s i, g j ) = w 1 (m i g j ) T Σ 1 (m i g j ) + w 2 (dm i dg j ) d(s i, l j ) = w 1 (m i l j ) T Σ 1 (m i l j ) + w 2 (dm i dl j ) (8) In d(s i, g j ) The first term is the Mahalanobis distance [Duda and Hart, 1973] between m i and g j. Σ, the covariance matrix of g, is approximated by a diagonal matrix with the element σ i equal to the inverse of the strength of edge point g i. In order to discriminate between edges belonging to two adjacent fingers, in the second term, we use the inner product of the normal direction of the model point and that of the corresponding edge point. w 1 and w 2 are the weights for trade-off between position information and orientation information. The notations for d(s i, l j ) are similar. This optimization problem is solved by a nearest-neighbor search. The search region is adapted according to the distribution of the distances between correspondences in previous frame. 4.2 Estimating 3D Homography Transformation Assuming the intrinsic camera parameter matrix Π is approximately invariant and can be estimated beforehand, and the camera optic center at time instant 1 is at a distance d in Z direction, we can express the 3D coordinate and 2D projection of the i th sample point as p i [1] = [x i[1] x[1],y i [1] ȳ[1], z i [1]] T (z i [1] = d) and m i [1] = 1 d Π p i [1], where ( x[1], ȳ[1]) is the centroid of the planar model. To reduce error accumulation, we estimate motion from the very first frame to Frame k. Since index i of each model point is consistent in all frames, the correspondence in Equation (7) is equivalent to correspondences between the edge points in Frame k and the model points in the very first frame. Denote the edge points in Frame k as x i [k] = F GE (i)[k], (i = 1 N) and x j [k] = F LE (j)[k] (j = N + 1 2N) and the model points in Frame 1 as x i [1] = m i [1], (i = 1 N) and x j [1] = m j N [1] (j = N + 1 2N), and we have x i [k] Hx i [1] (i = 1 2N). Given the correspondences, H can be solved up to a scale with the four-point algorithm [Faugeras et al., 21]. We warp the model p i [1] with homography H and project with intrinsic parameter matrix Π and go back to the matching step to search for correspondences. p i[k] = Hp i[1] (9) m i [k] = 1 z i [k] Π p i[k] (1) 4.3 Final Pose: From 3D Homography to R and T When ICP algorithm converges, the 3D homography H is solved from m i [k] = Hm i [1], i = 1, 2,... N where m i [1] is all the sample points in the very first frame and m i [k] is those in the
6 current frame, with the SVD-based algorithm [Faugeras et al., 21]. From H, we can find unique solution for R and T (see Appendix I), which are between p i [1] (the sample points in camera coordinate at time 1 ) and p i [k]. To find [R T] between p i[1] and p i [k] in the world coordinate as defined in Equation(6), we notice that thus R = R and T = (R I)[,, d, ] T + T. 5 EXPERIMENTAL RESULTS 5.1 Quantitative Results for Global Motion p i [k] = Rp i [1] + (R I)[,, d, ] T + T (11) The trajectory of translation and rotation parameters are shown in Figure 3. The mean square error Rot(X) Rot(Y) Rot(Z) T(X) T(Y) T(Z) Frame Frame Figure 3: Trajectory of translation and rotation parameters. The green lines show the original transformation used to synthesis image sequence, the red dots show the output of the global tracking algorithm (MSE) in each dimension is shown in Table 1, where R(X), R(Y ), R(Z) denotes rotation along X, Y, Z axis respectively and T (X), T (Y ), T (Z) denotes translation along X, Y, Z axis. Table 1: MSE, the range of motion (ROM) and the percentage value of their ratio in each dimension. R(X) R(Y ) R(Z) T (X) T (Y ) T (Z) MSE ROM Ratio(%)
7 5.2 Demonstration using Real-world Data We have implemented the system and it executes at 29 frames per second on a Pentium3 1.GHz processor. Figure 4 shows some snapshots from the various video clips. Simultaneous In-plane rotation, translation and slightly out-plane rotation Occlusion by another hand with very similar pose and shape for significant period of time with lighting variance Strong out-plane rotation with lighting variance Figure 4: Snapshots of the tracking result. These and other AVI sequences are available at the author s web site hzhou/histedge. Comparing with the hand tracker by Blake and Isard [Blake and Isard, 1998], our tracker has the following advantages: (1) it can recover not only in-plane rotation but also out-plane rotation; (2) it is robust against clutter background; and (3) it is robust against lighting variance. 6 Conclusions In this paper, we propose a Bayesian Framework to accommodate the a prior knowledge of the color and hand-shape within the observation from image sequences. Based on this framework, we introduce a new feature likelihood edge to combine color and edge information and use an ICPbased algorithm to find the maximum likelihood solution of the position and orientation of hand in 3D space. The ICP-based iteration alleviates the influence of noise, and the robustness has been verified with extensive experiments. Since we are using 2D model to approximate the hand, the matching of edge points will introduce considerable noise when extreme out-plane rotation of the hand occurs.
8 In order to drive a control using the recovered motion, it would be helpful to smooth the motion analysis result. We shall also extend our approach to articulated hand tracking by incorporating with the local finger tracker. Acknowledgments This work is supported by National Science Foundation Grant IIS and by National Science Foundation Alliance Program. The authors appreciate Ying Wu and John Lin for inspiring discussion and suggestions. References [Birchfield, 1998] Birchfield, S. (1998). Elliptical head tracking using intensity gradients and color histograms. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, volume 1, pages [Black and Jepson, 1996] Black, M. and Jepson, A. (1996). Eigentracking: Robust matching and tracking of articulated object using a view-based representation. In Proc. European Conference on Computer Vision, volume 1, pages [Blake and Isard, 1998] Blake, A. and Isard, M. (1998). Active Contours. Springer-Verlag, London. [Duda and Hart, 1973] Duda, R. and Hart, P. (1973). Pattern Classification and Scene Analysis. Wiley, New York. [Faugeras et al., 21] Faugeras, O., Luong, Q.-T., and Papadopoulo, T. (21). Geometry of multiple images. MIT press, Cambridge. [Hartley and Zisserman, 2] Hartley, R. and Zisserman, A. (2). Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge. [Jensen, 1996] Jensen, F. V. (1996). An Introduction to Bayesian Networks. Springer-Verlag, New York. [Jones and Rehg, 1999] Jones, M. and Rehg, J. (1999). Statistical color models with application to skin detection. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, volume I, pages , Fort Collins. [K.Fukunaga, 199] K.Fukunaga (199). Introduction to Statistical Pattern Recognition. Academic Press, New York, second edition. [O Hagan et al., 21] O Hagan, R., Zelinsky, A., and Rougeaux, S. (21). Visual gesture interfaces to virtual environments. Interacting with Computers. [Tao et al., 2] Tao, H., Sawhney, H., and Kumar, R. (2). Dynamic layer representation with applications to tracking. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, volume 2, pages [Tsai and Huang, 1981] Tsai, R. and Huang, T. (1981). Estimating 3-d motion parameters of a rigid planar patch i. ASSP, 29(12):
9 [Tsai et al., 1982] Tsai, R., Huang, T., and Zhu, W. (1982). Estimating 3-d motion parameters of a rigid planar patch ii: Singular value decomposition. ASSP, 3(8): [Weng et al., 1993] Weng, J., Ahuja, N., and Huang, T. (1993). Motion and Structure From Image Sequences. Springer-Verlag, New York. [Weng et al., 1992] Weng, J., Huang, T., and Ahuja, N. (1992). Motion and structure from line correspondence: Closed form solution, uniqueness and optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14: [Wu and Huang, 1999] Wu, Y. and Huang, T. S. (1999). Capturing articulated human hand motion: A divide-and-conquer approach. In Proc. IEEE International Conference on Computer Vision, pages , Corfu, Greece. [Wu and Huang, 21] Wu, Y. and Huang, T. S. (21). Robust visual tracking by co-inference learning. In Proc. of International Conference on Computer Vision, pages 26 33, Vancouver. [Zhang, 1994] Zhang, Z. (1994). Iterative point matching for registration of free-form curves and surfaces. International Journal of Computer Vision, 13:
calibrated coordinates Linear transformation pixel coordinates
1 calibrated coordinates Linear transformation pixel coordinates 2 Calibration with a rig Uncalibrated epipolar geometry Ambiguities in image formation Stratified reconstruction Autocalibration with partial
More informationParticle Filtering. CS6240 Multimedia Analysis. Leow Wee Kheng. Department of Computer Science School of Computing National University of Singapore
Particle Filtering CS6240 Multimedia Analysis Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore (CS6240) Particle Filtering 1 / 28 Introduction Introduction
More informationFace Tracking. Synonyms. Definition. Main Body Text. Amit K. Roy-Chowdhury and Yilei Xu. Facial Motion Estimation
Face Tracking Amit K. Roy-Chowdhury and Yilei Xu Department of Electrical Engineering, University of California, Riverside, CA 92521, USA {amitrc,yxu}@ee.ucr.edu Synonyms Facial Motion Estimation Definition
More informationAugmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit
Augmented Reality VU Computer Vision 3D Registration (2) Prof. Vincent Lepetit Feature Point-Based 3D Tracking Feature Points for 3D Tracking Much less ambiguous than edges; Point-to-point reprojection
More informationarxiv: v1 [cs.cv] 28 Sep 2018
Camera Pose Estimation from Sequence of Calibrated Images arxiv:1809.11066v1 [cs.cv] 28 Sep 2018 Jacek Komorowski 1 and Przemyslaw Rokita 2 1 Maria Curie-Sklodowska University, Institute of Computer Science,
More informationEpipolar Geometry in Stereo, Motion and Object Recognition
Epipolar Geometry in Stereo, Motion and Object Recognition A Unified Approach by GangXu Department of Computer Science, Ritsumeikan University, Kusatsu, Japan and Zhengyou Zhang INRIA Sophia-Antipolis,
More informationDynamic Time Warping for Binocular Hand Tracking and Reconstruction
Dynamic Time Warping for Binocular Hand Tracking and Reconstruction Javier Romero, Danica Kragic Ville Kyrki Antonis Argyros CAS-CVAP-CSC Dept. of Information Technology Institute of Computer Science KTH,
More informationFlexible Calibration of a Portable Structured Light System through Surface Plane
Vol. 34, No. 11 ACTA AUTOMATICA SINICA November, 2008 Flexible Calibration of a Portable Structured Light System through Surface Plane GAO Wei 1 WANG Liang 1 HU Zhan-Yi 1 Abstract For a portable structured
More informationReal-Time Model-Based Hand Localization for Unsupervised Palmar Image Acquisition
Real-Time Model-Based Hand Localization for Unsupervised Palmar Image Acquisition Ivan Fratric 1, Slobodan Ribaric 1 1 University of Zagreb, Faculty of Electrical Engineering and Computing, Unska 3, 10000
More informationELEC Dr Reji Mathew Electrical Engineering UNSW
ELEC 4622 Dr Reji Mathew Electrical Engineering UNSW Review of Motion Modelling and Estimation Introduction to Motion Modelling & Estimation Forward Motion Backward Motion Block Motion Estimation Motion
More informationEigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation
EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation Michael J. Black and Allan D. Jepson Xerox Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto,
More informationA Robust Two Feature Points Based Depth Estimation Method 1)
Vol.31, No.5 ACTA AUTOMATICA SINICA September, 2005 A Robust Two Feature Points Based Depth Estimation Method 1) ZHONG Zhi-Guang YI Jian-Qiang ZHAO Dong-Bin (Laboratory of Complex Systems and Intelligence
More informationTracking Articulated Hand Motion with Eigen Dynamics Analysis
Tracking Articulated Hand Motion with Eigen Dynamics Analysis Hanning Zhou and Thomas S. Huang ECE Department, University of Illinois at Urbana-Champaign Urbana, IL 6181 email: {hzhou,huang}@ifp.uiuc.edu
More informationMiniature faking. In close-up photo, the depth of field is limited.
Miniature faking In close-up photo, the depth of field is limited. http://en.wikipedia.org/wiki/file:jodhpur_tilt_shift.jpg Miniature faking Miniature faking http://en.wikipedia.org/wiki/file:oregon_state_beavers_tilt-shift_miniature_greg_keene.jpg
More informationSynchronized Ego-Motion Recovery of Two Face-to-Face Cameras
Synchronized Ego-Motion Recovery of Two Face-to-Face Cameras Jinshi Cui, Yasushi Yagi, Hongbin Zha, Yasuhiro Mukaigawa, and Kazuaki Kondo State Key Lab on Machine Perception, Peking University, China {cjs,zha}@cis.pku.edu.cn
More informationFingertips Tracking based on Gradient Vector
Int. J. Advance Soft Compu. Appl, Vol. 7, No. 3, November 2015 ISSN 2074-8523 Fingertips Tracking based on Gradient Vector Ahmad Yahya Dawod 1, Md Jan Nordin 1, and Junaidi Abdullah 2 1 Pattern Recognition
More informationIndex. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 263
Index 3D reconstruction, 125 5+1-point algorithm, 284 5-point algorithm, 270 7-point algorithm, 265 8-point algorithm, 263 affine point, 45 affine transformation, 57 affine transformation group, 57 affine
More informationGeometric camera models and calibration
Geometric camera models and calibration http://graphics.cs.cmu.edu/courses/15-463 15-463, 15-663, 15-862 Computational Photography Fall 2018, Lecture 13 Course announcements Homework 3 is out. - Due October
More informationA General Expression of the Fundamental Matrix for Both Perspective and Affine Cameras
A General Expression of the Fundamental Matrix for Both Perspective and Affine Cameras Zhengyou Zhang* ATR Human Information Processing Res. Lab. 2-2 Hikari-dai, Seika-cho, Soraku-gun Kyoto 619-02 Japan
More informationMoving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation
IJCSNS International Journal of Computer Science and Network Security, VOL.13 No.11, November 2013 1 Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial
More informationIndex. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 253
Index 3D reconstruction, 123 5+1-point algorithm, 274 5-point algorithm, 260 7-point algorithm, 255 8-point algorithm, 253 affine point, 43 affine transformation, 55 affine transformation group, 55 affine
More informationDepth-Layer-Based Patient Motion Compensation for the Overlay of 3D Volumes onto X-Ray Sequences
Depth-Layer-Based Patient Motion Compensation for the Overlay of 3D Volumes onto X-Ray Sequences Jian Wang 1,2, Anja Borsdorf 2, Joachim Hornegger 1,3 1 Pattern Recognition Lab, Friedrich-Alexander-Universität
More informationMOTION SEGMENTATION BASED ON INDEPENDENT SUBSPACE ANALYSIS
MOTION SEGMENTATION BASED ON INDEPENDENT SUBSPACE ANALYSIS Zhimin Fan, Jie Zhou and Ying Wu Department of Automation, Tsinghua University, Beijing 100084, China Department of Electrical and Computer Engineering,
More informationEpipolar Geometry and Stereo Vision
Epipolar Geometry and Stereo Vision Computer Vision Jia-Bin Huang, Virginia Tech Many slides from S. Seitz and D. Hoiem Last class: Image Stitching Two images with rotation/zoom but no translation. X x
More informationStructure from Motion. Prof. Marco Marcon
Structure from Motion Prof. Marco Marcon Summing-up 2 Stereo is the most powerful clue for determining the structure of a scene Another important clue is the relative motion between the scene and (mono)
More informationMulti-view Appearance-based 3D Hand Pose Estimation
Multi-view Appearance-based 3D Hand Pose Estimation Haiying Guan, Jae Sik Chang, Longbin Chen, Rogerio S. Feris, and Matthew Turk Computer Science Department University of California at Santa Barbara,
More informationHand Gesture Recognition. By Jonathan Pritchard
Hand Gesture Recognition By Jonathan Pritchard Outline Motivation Methods o Kinematic Models o Feature Extraction Implemented Algorithm Results Motivation Virtual Reality Manipulation of virtual objects
More informationSimultaneous Vanishing Point Detection and Camera Calibration from Single Images
Simultaneous Vanishing Point Detection and Camera Calibration from Single Images Bo Li, Kun Peng, Xianghua Ying, and Hongbin Zha The Key Lab of Machine Perception (Ministry of Education), Peking University,
More informationTask analysis based on observing hands and objects by vision
Task analysis based on observing hands and objects by vision Yoshihiro SATO Keni Bernardin Hiroshi KIMURA Katsushi IKEUCHI Univ. of Electro-Communications Univ. of Karlsruhe Univ. of Tokyo Abstract In
More informationVisual Recognition: Image Formation
Visual Recognition: Image Formation Raquel Urtasun TTI Chicago Jan 5, 2012 Raquel Urtasun (TTI-C) Visual Recognition Jan 5, 2012 1 / 61 Today s lecture... Fundamentals of image formation You should know
More informationRobust Camera Calibration from Images and Rotation Data
Robust Camera Calibration from Images and Rotation Data Jan-Michael Frahm and Reinhard Koch Institute of Computer Science and Applied Mathematics Christian Albrechts University Kiel Herman-Rodewald-Str.
More informationFeature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies
Feature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies M. Lourakis, S. Tzurbakis, A. Argyros, S. Orphanoudakis Computer Vision and Robotics Lab (CVRL) Institute of
More informationStructure from Motion and Multi- view Geometry. Last lecture
Structure from Motion and Multi- view Geometry Topics in Image-Based Modeling and Rendering CSE291 J00 Lecture 5 Last lecture S. J. Gortler, R. Grzeszczuk, R. Szeliski,M. F. Cohen The Lumigraph, SIGGRAPH,
More informationColour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation
ÖGAI Journal 24/1 11 Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation Michael Bleyer, Margrit Gelautz, Christoph Rhemann Vienna University of Technology
More informationCamera Calibration from the Quasi-affine Invariance of Two Parallel Circles
Camera Calibration from the Quasi-affine Invariance of Two Parallel Circles Yihong Wu, Haijiang Zhu, Zhanyi Hu, and Fuchao Wu National Laboratory of Pattern Recognition, Institute of Automation, Chinese
More informationIntroduction to Computer Vision
Introduction to Computer Vision Michael J. Black Nov 2009 Perspective projection and affine motion Goals Today Perspective projection 3D motion Wed Projects Friday Regularization and robust statistics
More informationPractical Camera Auto-Calibration Based on Object Appearance and Motion for Traffic Scene Visual Surveillance
Practical Camera Auto-Calibration Based on Object Appearance and Motion for Traffic Scene Visual Surveillance Zhaoxiang Zhang, Min Li, Kaiqi Huang and Tieniu Tan National Laboratory of Pattern Recognition,
More informationRequirements for region detection
Region detectors Requirements for region detection For region detection invariance transformations that should be considered are illumination changes, translation, rotation, scale and full affine transform
More informationA Framework for Multiple Radar and Multiple 2D/3D Camera Fusion
A Framework for Multiple Radar and Multiple 2D/3D Camera Fusion Marek Schikora 1 and Benedikt Romba 2 1 FGAN-FKIE, Germany 2 Bonn University, Germany schikora@fgan.de, romba@uni-bonn.de Abstract: In this
More informationSegmentation and Tracking of Partial Planar Templates
Segmentation and Tracking of Partial Planar Templates Abdelsalam Masoud William Hoff Colorado School of Mines Colorado School of Mines Golden, CO 800 Golden, CO 800 amasoud@mines.edu whoff@mines.edu Abstract
More informationUsing Subspace Constraints to Improve Feature Tracking Presented by Bryan Poling. Based on work by Bryan Poling, Gilad Lerman, and Arthur Szlam
Presented by Based on work by, Gilad Lerman, and Arthur Szlam What is Tracking? Broad Definition Tracking, or Object tracking, is a general term for following some thing through multiple frames of a video
More informationVisual Tracking. Image Processing Laboratory Dipartimento di Matematica e Informatica Università degli studi di Catania.
Image Processing Laboratory Dipartimento di Matematica e Informatica Università degli studi di Catania 1 What is visual tracking? estimation of the target location over time 2 applications Six main areas:
More informationTracking of Human Body using Multiple Predictors
Tracking of Human Body using Multiple Predictors Rui M Jesus 1, Arnaldo J Abrantes 1, and Jorge S Marques 2 1 Instituto Superior de Engenharia de Lisboa, Postfach 351-218317001, Rua Conselheiro Emído Navarro,
More informationProf. Fanny Ficuciello Robotics for Bioengineering Visual Servoing
Visual servoing vision allows a robotic system to obtain geometrical and qualitative information on the surrounding environment high level control motion planning (look-and-move visual grasping) low level
More informationSwitching Hypothesized Measurements: A Dynamic Model with Applications to Occlusion Adaptive Joint Tracking
Switching Hypothesized Measurements: A Dynamic Model with Applications to Occlusion Adaptive Joint Tracking Yang Wang Tele Tan Institute for Infocomm Research, Singapore {ywang, telctan}@i2r.a-star.edu.sg
More informationVisual Tracking of Human Body with Deforming Motion and Shape Average
Visual Tracking of Human Body with Deforming Motion and Shape Average Alessandro Bissacco UCLA Computer Science Los Angeles, CA 90095 bissacco@cs.ucla.edu UCLA CSD-TR # 020046 Abstract In this work we
More informationProjective Rectification from the Fundamental Matrix
Projective Rectification from the Fundamental Matrix John Mallon Paul F. Whelan Vision Systems Group, Dublin City University, Dublin 9, Ireland Abstract This paper describes a direct, self-contained method
More informationRectification and Distortion Correction
Rectification and Distortion Correction Hagen Spies March 12, 2003 Computer Vision Laboratory Department of Electrical Engineering Linköping University, Sweden Contents Distortion Correction Rectification
More informationRecovering structure from a single view Pinhole perspective projection
EPIPOLAR GEOMETRY The slides are from several sources through James Hays (Brown); Silvio Savarese (U. of Michigan); Svetlana Lazebnik (U. Illinois); Bill Freeman and Antonio Torralba (MIT), including their
More information55:148 Digital Image Processing Chapter 11 3D Vision, Geometry
55:148 Digital Image Processing Chapter 11 3D Vision, Geometry Topics: Basics of projective geometry Points and hyperplanes in projective space Homography Estimating homography from point correspondence
More informationTwo-view geometry Computer Vision Spring 2018, Lecture 10
Two-view geometry http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 2018, Lecture 10 Course announcements Homework 2 is due on February 23 rd. - Any questions about the homework? - How many of
More informationA 3D shape constraint on video
1 A 3D shape constraint on video Hui Ji and Cornelia Fermüller Member, IEEE, Center for Automation Research University of Maryland College Park, MD 20742-3275, USA {jihui, fer}@cfar.umd.edu Abstract We
More informationGaze interaction (2): models and technologies
Gaze interaction (2): models and technologies Corso di Interazione uomo-macchina II Prof. Giuseppe Boccignone Dipartimento di Scienze dell Informazione Università di Milano boccignone@dsi.unimi.it http://homes.dsi.unimi.it/~boccignone/l
More informationEuclidean Reconstruction Independent on Camera Intrinsic Parameters
Euclidean Reconstruction Independent on Camera Intrinsic Parameters Ezio MALIS I.N.R.I.A. Sophia-Antipolis, FRANCE Adrien BARTOLI INRIA Rhone-Alpes, FRANCE Abstract bundle adjustment techniques for Euclidean
More informationFast Outlier Rejection by Using Parallax-Based Rigidity Constraint for Epipolar Geometry Estimation
Fast Outlier Rejection by Using Parallax-Based Rigidity Constraint for Epipolar Geometry Estimation Engin Tola 1 and A. Aydın Alatan 2 1 Computer Vision Laboratory, Ecóle Polytechnique Fédéral de Lausanne
More informationFactorization with Missing and Noisy Data
Factorization with Missing and Noisy Data Carme Julià, Angel Sappa, Felipe Lumbreras, Joan Serrat, and Antonio López Computer Vision Center and Computer Science Department, Universitat Autònoma de Barcelona,
More informationSynthesis of Novel Views of Moving Objects in Airborne Video
Synthesis of Novel Views of Moving Objects in Airborne Video Zhanfeng Yue and Rama Chellappa Center for Automation Research University of Maryland College Park, MD, USA, 20742 {zyue,rama}@cfar.umd.edu
More informationTowards the completion of assignment 1
Towards the completion of assignment 1 What to do for calibration What to do for point matching What to do for tracking What to do for GUI COMPSCI 773 Feature Point Detection Why study feature point detection?
More informationFeatures Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)
Features Points Andrea Torsello DAIS Università Ca Foscari via Torino 155, 30172 Mestre (VE) Finding Corners Edge detectors perform poorly at corners. Corners provide repeatable points for matching, so
More informationFace Detection and Recognition in an Image Sequence using Eigenedginess
Face Detection and Recognition in an Image Sequence using Eigenedginess B S Venkatesh, S Palanivel and B Yegnanarayana Department of Computer Science and Engineering. Indian Institute of Technology, Madras
More informationMETRIC PLANE RECTIFICATION USING SYMMETRIC VANISHING POINTS
METRIC PLANE RECTIFICATION USING SYMMETRIC VANISHING POINTS M. Lefler, H. Hel-Or Dept. of CS, University of Haifa, Israel Y. Hel-Or School of CS, IDC, Herzliya, Israel ABSTRACT Video analysis often requires
More informationFace Recognition At-a-Distance Based on Sparse-Stereo Reconstruction
Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction Ham Rara, Shireen Elhabian, Asem Ali University of Louisville Louisville, KY {hmrara01,syelha01,amali003}@louisville.edu Mike Miller,
More informationTextureless Layers CMU-RI-TR Qifa Ke, Simon Baker, and Takeo Kanade
Textureless Layers CMU-RI-TR-04-17 Qifa Ke, Simon Baker, and Takeo Kanade The Robotics Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Abstract Layers are one of the most well
More informationA Two-stage Scheme for Dynamic Hand Gesture Recognition
A Two-stage Scheme for Dynamic Hand Gesture Recognition James P. Mammen, Subhasis Chaudhuri and Tushar Agrawal (james,sc,tush)@ee.iitb.ac.in Department of Electrical Engg. Indian Institute of Technology,
More informationInstance-level recognition part 2
Visual Recognition and Machine Learning Summer School Paris 2011 Instance-level recognition part 2 Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d Informatique,
More informationImage warping , , Computational Photography Fall 2017, Lecture 10
Image warping http://graphics.cs.cmu.edu/courses/15-463 15-463, 15-663, 15-862 Computational Photography Fall 2017, Lecture 10 Course announcements Second make-up lecture on Friday, October 6 th, noon-1:30
More informationAnnouncements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14
Announcements Computer Vision I CSE 152 Lecture 14 Homework 3 is due May 18, 11:59 PM Reading: Chapter 15: Learning to Classify Chapter 16: Classifying Images Chapter 17: Detecting Objects in Images Given
More informationStereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman
Stereo 11/02/2012 CS129, Brown James Hays Slides by Kristen Grauman Multiple views Multi-view geometry, matching, invariant features, stereo vision Lowe Hartley and Zisserman Why multiple views? Structure
More informationA Summary of Projective Geometry
A Summary of Projective Geometry Copyright 22 Acuity Technologies Inc. In the last years a unified approach to creating D models from multiple images has been developed by Beardsley[],Hartley[4,5,9],Torr[,6]
More informationPATTERN CLASSIFICATION AND SCENE ANALYSIS
PATTERN CLASSIFICATION AND SCENE ANALYSIS RICHARD O. DUDA PETER E. HART Stanford Research Institute, Menlo Park, California A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS New York Chichester Brisbane
More informationRobust Lip Contour Extraction using Separability of Multi-Dimensional Distributions
Robust Lip Contour Extraction using Separability of Multi-Dimensional Distributions Tomokazu Wakasugi, Masahide Nishiura and Kazuhiro Fukui Corporate Research and Development Center, Toshiba Corporation
More informationLast update: May 4, Vision. CMSC 421: Chapter 24. CMSC 421: Chapter 24 1
Last update: May 4, 200 Vision CMSC 42: Chapter 24 CMSC 42: Chapter 24 Outline Perception generally Image formation Early vision 2D D Object recognition CMSC 42: Chapter 24 2 Perception generally Stimulus
More informationThere are many cues in monocular vision which suggests that vision in stereo starts very early from two similar 2D images. Lets see a few...
STEREO VISION The slides are from several sources through James Hays (Brown); Srinivasa Narasimhan (CMU); Silvio Savarese (U. of Michigan); Bill Freeman and Antonio Torralba (MIT), including their own
More informationToday. Stereo (two view) reconstruction. Multiview geometry. Today. Multiview geometry. Computational Photography
Computational Photography Matthias Zwicker University of Bern Fall 2009 Today From 2D to 3D using multiple views Introduction Geometry of two views Stereo matching Other applications Multiview geometry
More informationObject detection using non-redundant local Binary Patterns
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2010 Object detection using non-redundant local Binary Patterns Duc Thanh
More informationElliptical Head Tracker using Intensity Gradients and Texture Histograms
Elliptical Head Tracker using Intensity Gradients and Texture Histograms Sriram Rangarajan, Dept. of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634 srangar@clemson.edu December
More informationUniversity of Cambridge Engineering Part IIB Module 4F12: Computer Vision and Robotics Handout 1: Introduction
University of Cambridge Engineering Part IIB Module 4F12: Computer Vision and Robotics Handout 1: Introduction Roberto Cipolla October 2006 Introduction 1 What is computer vision? Vision is about discovering
More informationVisual Tracking (1) Tracking of Feature Points and Planar Rigid Objects
Intelligent Control Systems Visual Tracking (1) Tracking of Feature Points and Planar Rigid Objects Shingo Kagami Graduate School of Information Sciences, Tohoku University swk(at)ic.is.tohoku.ac.jp http://www.ic.is.tohoku.ac.jp/ja/swk/
More informationSegmentation of Distinct Homogeneous Color Regions in Images
Segmentation of Distinct Homogeneous Color Regions in Images Daniel Mohr and Gabriel Zachmann Department of Computer Science, Clausthal University, Germany, {mohr, zach}@in.tu-clausthal.de Abstract. In
More informationMixture Models and EM
Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering
More informationGrasp Recognition using a 3D Articulated Model and Infrared Images
Grasp Recognition using a 3D Articulated Model and Infrared Images Koichi Ogawara Institute of Industrial Science, Univ. of Tokyo, Tokyo, Japan Jun Takamatsu Institute of Industrial Science, Univ. of Tokyo,
More information3D Model Acquisition by Tracking 2D Wireframes
3D Model Acquisition by Tracking 2D Wireframes M. Brown, T. Drummond and R. Cipolla {96mab twd20 cipolla}@eng.cam.ac.uk Department of Engineering University of Cambridge Cambridge CB2 1PZ, UK Abstract
More informationShort Survey on Static Hand Gesture Recognition
Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of
More informationStereo CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz
Stereo CSE 576 Ali Farhadi Several slides from Larry Zitnick and Steve Seitz Why do we perceive depth? What do humans use as depth cues? Motion Convergence When watching an object close to us, our eyes
More informationfirst order approx. u+d second order approx. (S)
Computing Dierential Properties of 3-D Shapes from Stereoscopic Images without 3-D Models F. Devernay and O. D. Faugeras INRIA. 2004, route des Lucioles. B.P. 93. 06902 Sophia-Antipolis. FRANCE. Abstract
More informationLast week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints
Last week Multi-Frame Structure from Motion: Multi-View Stereo Unknown camera viewpoints Last week PCA Today Recognition Today Recognition Recognition problems What is it? Object detection Who is it? Recognizing
More informationAccurate 3D Face and Body Modeling from a Single Fixed Kinect
Accurate 3D Face and Body Modeling from a Single Fixed Kinect Ruizhe Wang*, Matthias Hernandez*, Jongmoo Choi, Gérard Medioni Computer Vision Lab, IRIS University of Southern California Abstract In this
More informationStereo and Epipolar geometry
Previously Image Primitives (feature points, lines, contours) Today: Stereo and Epipolar geometry How to match primitives between two (multiple) views) Goals: 3D reconstruction, recognition Jana Kosecka
More informationStructured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov
Structured Light II Johannes Köhler Johannes.koehler@dfki.de Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Introduction Previous lecture: Structured Light I Active Scanning Camera/emitter
More informationCS231A Course Notes 4: Stereo Systems and Structure from Motion
CS231A Course Notes 4: Stereo Systems and Structure from Motion Kenji Hata and Silvio Savarese 1 Introduction In the previous notes, we covered how adding additional viewpoints of a scene can greatly enhance
More informationScale Invariant Feature Transform
Why do we care about matching features? Scale Invariant Feature Transform Camera calibration Stereo Tracking/SFM Image moiaicing Object/activity Recognition Objection representation and recognition Automatic
More informationInvariant Features from Interest Point Groups
Invariant Features from Interest Point Groups Matthew Brown and David Lowe {mbrown lowe}@cs.ubc.ca Department of Computer Science, University of British Columbia, Vancouver, Canada. Abstract This paper
More informationAlgorithm research of 3D point cloud registration based on iterative closest point 1
Acta Technica 62, No. 3B/2017, 189 196 c 2017 Institute of Thermomechanics CAS, v.v.i. Algorithm research of 3D point cloud registration based on iterative closest point 1 Qian Gao 2, Yujian Wang 2,3,
More informationTracking Using Online Feature Selection and a Local Generative Model
Tracking Using Online Feature Selection and a Local Generative Model Thomas Woodley Bjorn Stenger Roberto Cipolla Dept. of Engineering University of Cambridge {tew32 cipolla}@eng.cam.ac.uk Computer Vision
More informationCamera Calibration Using Line Correspondences
Camera Calibration Using Line Correspondences Richard I. Hartley G.E. CRD, Schenectady, NY, 12301. Ph: (518)-387-7333 Fax: (518)-387-6845 Email : hartley@crd.ge.com Abstract In this paper, a method of
More informationKeeping features in the camera s field of view: a visual servoing strategy
Keeping features in the camera s field of view: a visual servoing strategy G. Chesi, K. Hashimoto,D.Prattichizzo,A.Vicino Department of Information Engineering, University of Siena Via Roma 6, 3 Siena,
More informationMotion Tracking and Event Understanding in Video Sequences
Motion Tracking and Event Understanding in Video Sequences Isaac Cohen Elaine Kang, Jinman Kang Institute for Robotics and Intelligent Systems University of Southern California Los Angeles, CA Objectives!
More informationMonocular Vision-based Displacement Measurement System Robust to Angle and Distance Using Homography
6 th International Conference on Advances in Experimental Structural Engineering 11 th International Workshop on Advanced Smart Materials and Smart Structures Technology August 1-2, 2015, University of
More informationLocal Image Registration: An Adaptive Filtering Framework
Local Image Registration: An Adaptive Filtering Framework Gulcin Caner a,a.murattekalp a,b, Gaurav Sharma a and Wendi Heinzelman a a Electrical and Computer Engineering Dept.,University of Rochester, Rochester,
More informationVisual Tracking. Antonino Furnari. Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania
Visual Tracking Antonino Furnari Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania furnari@dmi.unict.it 11 giugno 2015 What is visual tracking? estimation
More information