Local qualitative shape from stereo. without detailed correspondence. Extended Abstract. Shimon Edelman. Internet:

Similar documents
first order approx. u+d second order approx. (S)

COMPUTER AND ROBOT VISION

COMP 546. Lecture 11. Shape from X: perspective, texture, shading. Thurs. Feb. 15, 2018

Affine Surface Reconstruction By Purposive Viewpoint Control

Feature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies

Perspective and vanishing points

Stereo Vision. MAN-522 Computer Vision

lecture 10 - depth from blur, binocular stereo

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing

Introduction to Computer Vision

Affine Matching of Planar Sets

Classifier C-Net. 2D Projected Images of 3D Objects. 2D Projected Images of 3D Objects. Model I. Model II

3D Reconstruction from Scene Knowledge

A Survey of Light Source Detection Methods

Shape Constancy and Shape Recovery: wherein Human and Computer Vision Meet

Curvature Estimation on Smoothed 3-D Meshes

Detecting Salient Contours Using Orientation Energy Distribution. Part I: Thresholding Based on. Response Distribution

STRUCTURE AND MOTION ESTIMATION FROM DYNAMIC SILHOUETTES UNDER PERSPECTIVE PROJECTION *

Perception, Part 2 Gleitman et al. (2011), Chapter 5

Limits on the Perception of Local Shape from Shading

Partial Calibration and Mirror Shape Recovery for Non-Central Catadioptric Systems

Structure from Motion. Prof. Marco Marcon

Mathematics of a Multiple Omni-Directional System

A Statistical Consistency Check for the Space Carving Algorithm.

Target Tracking Using Mean-Shift And Affine Structure

Unwarping paper: A differential geometric approach

CSE 252B: Computer Vision II

A Scale Invariant Surface Curvature Estimator

Flow Estimation. Min Bai. February 8, University of Toronto. Min Bai (UofT) Flow Estimation February 8, / 47

Simultaneous surface texture classification and illumination tilt angle prediction

Laboratory for Computational Intelligence Main Mall. The University of British Columbia. Canada. Abstract

Automatic Feature Extraction of Pose-measuring System Based on Geometric Invariants

Depth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth

What is Computer Vision?

Adaptive Zoom Distance Measuring System of Camera Based on the Ranging of Binocular Vision

METR Robotics Tutorial 2 Week 2: Homogeneous Coordinates

Proceedings of the 6th Int. Conf. on Computer Analysis of Images and Patterns. Direct Obstacle Detection and Motion. from Spatio-Temporal Derivatives

Unit 3 Multiple View Geometry

Last update: May 4, Vision. CMSC 421: Chapter 24. CMSC 421: Chapter 24 1

Representing 3D Objects: An Introduction to Object Centered and Viewer Centered Models

British Machine Vision Conference 2 The established approach for automatic model construction begins by taking surface measurements from a number of v

calibrated coordinates Linear transformation pixel coordinates

Segmentation and Tracking of Partial Planar Templates

There are many cues in monocular vision which suggests that vision in stereo starts very early from two similar 2D images. Lets see a few...

STRUCTURAL ICP ALGORITHM FOR POSE ESTIMATION BASED ON LOCAL FEATURES

arxiv: v1 [cs.cv] 28 Sep 2018

Final Exam Study Guide

Gaze interaction (2): models and technologies

TOWARDS STRUCTURE AND MOTION ESTIMATION FROM DYNAMIC SILHOUETTES *

COMP 558 lecture 22 Dec. 1, 2010

Viewing with Computers (OpenGL)

CHAPTER 3 DISPARITY AND DEPTH MAP COMPUTATION

Visual Hulls from Single Uncalibrated Snapshots Using Two Planar Mirrors

Comparison of HK and SC curvature description methods

A General Expression of the Fundamental Matrix for Both Perspective and Affine Cameras

The end of affine cameras

Machine vision. Summary # 11: Stereo vision and epipolar geometry. u l = λx. v l = λy

Categorization by Learning and Combining Object Parts

Stereo and Epipolar geometry

The perception of surface orientation from multiple sources of optical information

Computational Foundations of Cognitive Science

Multiple View Geometry in Computer Vision Second Edition

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

2 The MiníMax Principle First consider a simple problem. This problem will address the tradeos involved in a two-objective optimiation problem, where

Chapter 3 Image Registration. Chapter 3 Image Registration

Arm coordinate system. View 1. View 1 View 2. View 2 R, T R, T R, T R, T. 12 t 1. u_ 1 u_ 2. Coordinate system of a robot

A Simple Vision System

Object Modeling from Multiple Images Using Genetic Algorithms. Hideo SAITO and Masayuki MORI. Department of Electrical Engineering, Keio University

Basic distinctions. Definitions. Epstein (1965) familiar size experiment. Distance, depth, and 3D shape cues. Distance, depth, and 3D shape cues

Towards direct motion and shape parameter recovery from image sequences. Stephen Benoit. Ph.D. Thesis Presentation September 25, 2003

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

Level lines based disocclusion

Requirements for region detection

Bitangent 3. Bitangent 1. dist = max Region A. Region B. Bitangent 2. Bitangent 4

Structure from Motion. Lecture-15

Multi-Scale Free-Form Surface Description

ARTIFICIAL INTELLIGENCE LABORATORY. A.I. Memo No.??? July, Pamela Lipson and Shimon Ullman

3D Shape Recovery of Smooth Surfaces: Dropping the Fixed Viewpoint Assumption

Chapter 5. Projections and Rendering

A Novel Stereo Camera System by a Biprism

Shape from Textures Phenomenon for 3D Modeling Applications with Big Data Images

A Framework for Recovering Ane Transforms Using Points, Lines. or Image Brightnesses. R. Manmatha

Multiview Reconstruction

Hand-Eye Calibration from Image Derivatives

Passive 3D Photography

A Factorization Method for Structure from Planar Motion

Motion and Tracking. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Evolution of Impossible Objects

Segmentation of Range Data for the Automatic Construction of Models of Articulated Objects

Recognition. Clark F. Olson. Cornell University. work on separate feature sets can be performed in

CS 534: Computer Vision 3D Model-based recognition

2 Algorithm Description Active contours are initialized using the output of the SUSAN edge detector [10]. Edge runs that contain a reasonable number (

Ruch (Motion) Rozpoznawanie Obrazów Krzysztof Krawiec Instytut Informatyki, Politechnika Poznańska. Krzysztof Krawiec IDSS

EE795: Computer Vision and Intelligent Systems

Vision-based endoscope tracking for 3D ultrasound image-guided surgical navigation [Yang et al. 2014, Comp Med Imaging and Graphics]

Computer Vision Lecture 17

Massachusetts Institute of Technology Department of Computer Science and Electrical Engineering 6.801/6.866 Machine Vision QUIZ II

Structure from Motion

Computer Vision Lecture 17

Chapter 4. Chapter 4. Computer Graphics 2006/2007 Chapter 4. Introduction to 3D 1

Transcription:

Local qualitative shape from stereo without detailed correspondence Extended Abstract Shimon Edelman Center for Biological Information Processing MIT E25-201, Cambridge MA 02139 Internet: edelman@ai.mit.edu Introduction Schemes for the extraction of qualitative shape information from stereo tend to relegate the solution of the correspondence problem to a preprocessing stage and to assume that their input is provided in the form of a matched image pair { a disparity map [1]. The disparity map is a rich source of shape information, which allows the recovery of depth on a ratio scale [2] even when the camera geometry is unknown. Consequently, using the disparity map to extract qualitative rather than inexact quantitative shape information may be well-justied by considerations of stability and robustness [3], but appears to be wasteful in the sense that a rich and dicult to compute representation is transformed into a relatively terse one (Figure 1). Figure 1: Can at least partial qualitative shape information be obtained from a stereo pair without going through a detailed disparity map? Can at least partial qualitative shape information be obtained from a stereo pair without going through a detailed disparity map? A similar question arises in several approaches to model-based 3D object recognition, where object and model features have to be matched before the object's viewpoint transformation, used subsequently to align the object's image with the model, can be recovered (e.g., [4]). In object recognition the only alternative to detailed matching suggested so far is a class of methods that involve the computation of quantities that depend on entire objects rather than on their local features. One such method [5] exploits the observation that matrices formed by the 2D image-plane moments of rigid planar patch objects in 3D transform as tensors under a general viewpoint change (modeled, under orthographic projection, by an image-plane ane transformation). As an approach to recognition, the moment-based methods suer from

an inherent sensitivity to occlusion and to imprecise or noisy segmentation. The idea behind the moment approach can be incorporated, however, into a qualitative vision module that provides information about the sign of the Gaussian curvature of surface patches through the use of regional rather than detailed correspondence (representation by the sign of the Gaussian curvature has been proposed, e.g., in [6,7]). The idea An arbitrary 3D viewpoint change can be modeled by an image-plane (2D) ane transformation if and only if the viewed object is planar. Let p = (x y z ) T and p 0 = (x 0 y 0 z 0 ) T be corresponding points in two views of the same object. Assume that p and p 0 are related by a 3D rotation R (the addition of 3D translation does not change the argument): 0 @ x0 y 0 z 0 1 0 A = @ r 11 r 12 r 13 r 21 r 22 r 23 r 31 r 32 r 33 If the object is planar, that is, if for every point p = (x y z) T the depth z = Ax + By + C, then the projections of p and p 0 are related by a 2D ane transformation: 1 0 A @ x y z 1 A x 0 = r 11 x + r 12 y + r 13 (Ax + By + C) = = (r 11 + r 13 A)x + (r 12 + r 13 B)y + r 13 C y 0 = r 21 x + r 22 y + r 23 (Ax + By + C) = = (r 21 + r 23 A)x + (r 22 + r 23 B)y + r 23 C (1) This transformation also relates the positions of the centroid (rst-order moment tensor) of the planar patch as seen from the two viewpoints. Suppose that the patch is, in fact, not planar but hyperbolic (saddle-like). If the image of such a patch is divided into, say, four sectors with the origin at the centroid, some of the sectors will be in part closer to the two cameras than the tangent plane at the centroid, and others will be farther than the tangent plane. Moreover, the \near" and the \far" sectors will alternate as one goes around the patch centroid. 1 Consequently, the direction from the actual location of a sector's centroid towards its location as predicted by the \global" ane transform will change four times. 2 These changes can be detected by looking at the signs of the inner products of successive displacement vectors. The number of sign changes will be zero for elliptic or planar patches. Elliptic patches, however, can be detected by looking at the inside/outside dierence in the signs of the relative depth values rather than at the sector-by-sector dierences (see Figure 2B, right). A convex patch will have a positive (+) relative depth on the inside and a negative (0) one on the outside, and a concave patch { vice versa. Note that the depth is compared to that of a planar approximation to the surface, which is more or less parallel to the tangent plane at the center, but does not necessarily coincide with it (Figure 2A). 1 More than four sectors may have to be looked at, e.g., if the parabolic lines at the center of the patch form an acute angle. 2 The idea here is similar to Weinshall's [7], who related the number of sign changes around the center in the output of a simple operator to the sign of the Gaussian curvature of the surface.

Figure 2: (A), The \average" plane approximation to a convex patch. (B), left: a hyperbolic patch is characterized by an alternation of the signs of depth values, where depth is computed relative to the \average" planar approximation. The arrows show the displacements of the sector centroids relative to the locations predicted by the global ane transform (see text). Right: an elliptic patch yields an inside/outside dierence in the sign of the relative depth value rather than a sector-by-sector one. Bottom: the classication algorithm.

Underlying assumptions The ability of the above method to come up with a correct qualitative description of the surface patch depends on several conditions: The \global" ane approximation to the transform relating the two input images must be known. One way to obtain such knowledge is through the use of the tensor method [5]. However, if the method is to be used in a binocular stereo setting, an approximate knowledge of the interocular distance would suce. Since the method substitutes distance along the line of sight for the distance along the local normal to the surface, it works best when the two are close, i.e., when the slant and the tilt of the planar approximation to the surface are small. Alternatively, if the slant and the tilt are available independently (e.g., from a motion-based mechanism, or through a least-squares solution for A, B and C, given r ij and several corresponding region centroids), the method can be modied to use them. While the method does not need feature to feature correspondence between the two frames, its performance depends on correct attribution of features to the sectors over which the centroids are computed (Figure 2B). Placing the origin of the reference system in each frame at the global centroid usually suces for that purpose. Better yet, the origin may be placed at a prominent well-localized feature, if one can be identied in the two frames. Implementation by receptive elds As shown in Figure 2, the operation on which the present method is based can be carried out by a hard-wired mechanism such as a binocular \retinal" receptive eld (RF) that can compute image centroids over its sub-elds and compare the results from the two images. The classication of the surface at each location can then be computed by combining the output of a \saddle detector" with that of an \egg detector" (Figure 2B). Examples Experiments with images of surface patches produced by a solid modeling system showed that the simple classication method outlined above works well both under orthographic and perspective projections and tolerates global surface tilt and slant that is comparable to largest dierence in Figure 3: Left: A synthetic stereo pair, showing a textured pear-shaped object, and the output of a hyperbolic patch detector (white = hyperbolic). The pear's neck has been correctly classied as most likely to be hyperbolic. Right: The same operator applied to two frames of a motion sequence of a rotating human head. The bridge of the nose has been correctly classied as hyperbolic.

local normal orientation over the patch (that is, no part of the patch is allowed to be tangent to the line of sight because of the global tilt/slant). The method can also be applied to raw (gray-level) synthetic images (Figure 3). Summary I have described a simple method that, under certain assumptions, is capable of extracting crude qualitative shape information (namely, the sign of the Gaussian curvature) from stereo without detailed correspondence. The method is based on centroid computation, an operation that can be implemented by a receptive eld, and can be applied directly to gray-level images. Although the method is attractively straightforward and can serve as a part of a fast low-level qualitative shape module, it would probably be of little use in systems that can aord the computational eort of stereo matching and reconstruction. The question regarding the possibility of extracting more complex qualitative shape features without correspondence appears, at present, to have no denite answer. References [1] D. Weinshall. Application of qualitative depth and shape from stereo. In Proceedings of the 2nd International Conference on Computer Vision, pages 144{148, Tarpon Springs, FL, 1988. IEEE, Washington, DC. [2] W. B. Thompson and J. K. Kearney. Inexact vision. In Workshop on motion, representation and analysis, pages 15{22, 1986. [3] J. J. Koenderink and A. J. van Doorn. Depth and shape from dierential perspective in the presence of bending deformations. Journal of the Optical Society of America, 3:242{249, 1986. [4] S. Ullman. Aligning pictorial descriptions: an approach to object recognition. Cognition, 32:193{254, 1989. [5] D. Cyganski and J. A. Orr. Application of tensor theory to object recognition and orientation determination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 7:662{673, 1985. [6] P. J. Besl and R. C. Jain. Invariant surface characteristics for 3D object recognition in range images. Computer Vision, Graphics, and Image Processing, 33:33{80, 1986. [7] D. Weinshall. Direct computation of 3D shape and motion invariants. A.I. Memo No. 1131, Articial Intelligence Laboratory, Massachusetts Institute of Technology, May 1989.