CAMERAS AND GRAVITY: ESTIMATING PLANAR OBJECT ORIENTATION. Zhaoyin Jia, Andrew Gallagher, Tsuhan Chen

Similar documents
New Geometric Interpretation and Analytic Solution for Quadrilateral Reconstruction

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection

arxiv: v1 [cs.cv] 28 Sep 2018

Camera Geometry II. COS 429 Princeton University

Geometric camera models and calibration

1 Projective Geometry

Single-view metrology

Homographies and RANSAC

Flexible Calibration of a Portable Structured Light System through Surface Plane

calibrated coordinates Linear transformation pixel coordinates

Computer Graphics Chapter 7 Three-Dimensional Viewing Viewing

Epipolar Geometry and Stereo Vision

Camera Model and Calibration

Homography based visual odometry with known vertical direction and weak Manhattan world assumption

Kinematic Analysis of a Family of 3R Manipulators

METRIC PLANE RECTIFICATION USING SYMMETRIC VANISHING POINTS

URBAN STRUCTURE ESTIMATION USING PARALLEL AND ORTHOGONAL LINES

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration

Autonomous Navigation for Flying Robots

Agenda. Rotations. Camera models. Camera calibration. Homographies

Fast Window Based Stereo Matching for 3D Scene Reconstruction

Simultaneous Vanishing Point Detection and Camera Calibration from Single Images

Stereo Image Rectification for Simple Panoramic Image Generation

Perspective Projection Describes Image Formation Berthold K.P. Horn

Miniature faking. In close-up photo, the depth of field is limited.

Exercises of PIV. incomplete draft, version 0.0. October 2009

Epipolar Geometry and Stereo Vision

Machine vision. Summary # 11: Stereo vision and epipolar geometry. u l = λx. v l = λy

An idea which can be used once is a trick. If it can be used more than once it becomes a method

Segmentation and Tracking of Partial Planar Templates

3D Reconstruction of a Hopkins Landmark

Unit 3 Multiple View Geometry

Two-view geometry Computer Vision Spring 2018, Lecture 10

Synchronized Ego-Motion Recovery of Two Face-to-Face Cameras

Dense Disparity Estimation in Ego-motion Reduced Search Space

Structure from motion

Chapter 3 Image Registration. Chapter 3 Image Registration

Robust Camera Calibration for an Autonomous Underwater Vehicle

MERGING POINT CLOUDS FROM MULTIPLE KINECTS. Nishant Rai 13th July, 2016 CARIS Lab University of British Columbia

A Plane Tracker for AEC-automation Applications

CS223b Midterm Exam, Computer Vision. Monday February 25th, Winter 2008, Prof. Jana Kosecka

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

Rigid Body Motion and Image Formation. Jana Kosecka, CS 482

Visualization 2D-to-3D Photo Rendering for 3D Displays

Single-view 3D Reconstruction

Image correspondences and structure from motion

Structure from Motion. Prof. Marco Marcon

Refinement of scene depth from stereo camera ego-motion parameters

Projective Geometry and Camera Models

Overview. Augmented reality and applications Marker-based augmented reality. Camera model. Binary markers Textured planar markers

A Factorization Method for Structure from Planar Motion

Feature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies

Camera Model and Calibration. Lecture-12

Agenda. Rotations. Camera calibration. Homography. Ransac

Camera Calibration from the Quasi-affine Invariance of Two Parallel Circles

arxiv: v1 [cs.cv] 28 Sep 2018

Today. Stereo (two view) reconstruction. Multiview geometry. Today. Multiview geometry. Computational Photography

Pin Hole Cameras & Warp Functions

Lecture'9'&'10:'' Stereo'Vision'

Dual Arm Robot Research Report

Stereo II CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Multiple View Geometry

A Classification of 3R Orthogonal Manipulators by the Topology of their Workspace

Stereo Vision. MAN-522 Computer Vision

Vision Review: Image Formation. Course web page:

Midterm Examination CS 534: Computational Photography

Automatic Photo Popup

Computer Vision Project-1

Hand-Eye Calibration from Image Derivatives

BIL Computer Vision Apr 16, 2014

Perspective Projection [2 pts]

Scene Modeling for a Single View

Shift-map Image Registration

Stereo and Epipolar geometry

COS429: COMPUTER VISON CAMERAS AND PROJECTIONS (2 lectures)

CS201 Computer Vision Camera Geometry

Camera Registration in a 3D City Model. Min Ding CS294-6 Final Presentation Dec 13, 2006

Robust Camera Calibration from Images and Rotation Data

Camera Models and Image Formation. Srikumar Ramalingam School of Computing University of Utah

Practical Camera Auto-Calibration Based on Object Appearance and Motion for Traffic Scene Visual Surveillance

CS4670: Computer Vision

Pin Hole Cameras & Warp Functions

A linear algorithm for Camera Self-Calibration, Motion and Structure Recovery for Multi-Planar Scenes from Two Perspective Images

Euclidean Reconstruction Independent on Camera Intrinsic Parameters

Visual Recognition: Image Formation

HW 1: Project Report (Camera Calibration)

Compositing a bird's eye view mosaic

Efficient Stereo Image Rectification Method Using Horizontal Baseline

CS6670: Computer Vision

Introduction to Computer Vision

Camera Calibration and 3D Reconstruction from Single Images Using Parallelepipeds

FINDING OPTICAL DISPERSION OF A PRISM WITH APPLICATION OF MINIMUM DEVIATION ANGLE MEASUREMENT METHOD

3-D D Euclidean Space - Vectors

A COMPREHENSIVE TOOL FOR RECOVERING 3D MODELS FROM 2D PHOTOS WITH WIDE BASELINES

Scene Modeling for a Single View

Lecture 3: Camera Calibration, DLT, SVD

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION

Planar pattern for automatic camera calibration

Scene Modeling for a Single View

1/5/2014. Bedrich Benes Purdue University Dec 6 th 2013 Prague. Modeling is an open problem in CG

Transcription:

CAMERAS AND GRAVITY: ESTIMATING PLANAR OBJECT ORIENTATION Zhaoyin Jia, Anrew Gallagher, Tsuhan Chen School of Electrical an Computer Engineering, Cornell University ABSTRACT Photography on a mobile camera provies access to aitional sensors. In this paper, we estimate the absolute orientation of a planar object with respect to the groun, which can be a valuable prior for many vision tasks. To fin the planar object orientation, our novel algorithm combines information from a gravity sensor with a planar homography that matches a region of an image to a training image (e.g., of a company logo). We emonstrate our approach with an iphone application that recors the gravity irection for each capture image. We fin a homography that maps the training image to the test image, an propose a novel homography ecomposition to extract the rotation matrix. We believe this is the first paper to estimate absolute planar object orientation by combining the inertial sensor information with vision algorithms. Experiments show that our propose algorithm performs reliably. Inex Terms Image processing, Image motion analysis, Image etection 1. INTRODUCTION Many mobile phones are equippe with inertial sensors, such as accelerometers an gyroscopes, to provie relatively accurate measurements of the phone s position, such as the orientation of the phone with respect to the groun. This information is usually use to improve user interaction, for example, showing an application winow in lanscape when a user rotates the mobile phone horizontally. In this paper, we combine the mobile evice s camera with a gravity sensor an show that the gravity sensor ata can estimate not only the position of the phone itself, but also the orientation of a planar object capture by the camera. One example is shown in Fig. 1: given a planar target object to etect (in Fig. 1 (a), the training view) an the gravity irection with respect to the camera (Fig. 1 (b)), we match the object in the test image (in Fig. 1 (c), the test view), an estimate the orientation of the object with respect to the groun through a special homography ecomposition, shown in Fig. 1 (). To our knowlege, we are the first to combine a vision algorithm (planar matching) with the inertial sensor to etermine the orientation of the object. The orientation of an object is a useful prior for many vision tasks. For example, even with (a) (c) Fig. 1. (a) The training view (frontal, no istortion) of the target. (b) The testing setup. (c) The test image containing the target. () The homography from the training image to the testing image. We compute the absolute orientation of the planar object, e.g. θ = 15, inicating the angle between the target artwork an the groun plane. the same image shown in Fig. 1 (a), if the image is isplaye vertically (θ = 9 ), it is more likely to be a painting hanging on the wall; if it is horizontal (θ = ), then it may be a picture in a book on a table. Relate work: Hoiem et. al [1] show that estimating surface orientation helps occlusion bounary etection, epth estimation an object recognition. Further applications of estimating the camera position, vanishing lines an the horizon are presente in [2]. This prior work proposes a learning-base approach to fin the surface orientation, an roughly classify these surfaces into vertical or horizontal. Our algorithm preicts the object orientation more precisely in egree, an incorporates a gravity sensor with pixel ata. In aition, there are other applications with ifferent goals that combine inertial sensors with images for matching, photo enhancement an robotics, such as [3], [4] an [5], but none combine ho- () (b)

2.1. Two-view geometry with planar homography For a homography H that maps points x in the image of camera to points x 1 in the image of camera 1, i.e., x 1 = Hx, [6] shows that the inuce planar homography H for the plane [n T, ] T between the two cameras is: We efine H as: H = K 1 (R tnt )K 1, (1) Fig. 2. Two views introuce by a homography with the gravity vector. mography matching with gravity sensors to accurately measure planar surface absolute orientation. Our work is closely relate to homography ecomposition presente in [6] [7] [8] an [9]. However, our setting an task is ifferent. These algorithms use the camera intrinsic matrices for ecomposition. In our case, although the mobile phone for testing can be calibrate, the training images containing targets to match have no camera information: they can be images ownloae from Internet, as shown in Fig. 1 (a). Our erivation shows that we can synthesize the camera matrix for the training view an get the target orientation, as long as the training image is frontal with no istortion, a usually vali assumption. We believe we are the first to combine this ecomposition with a gravity sensor to estimate the planar object orientation. 2. ALGORITHM We introuce our efinition of the variables in Fig. 2: we have two cameras (camera, which captures the training target image, an camera 1, which captures the test image) relate by a planar homography. The intrinsic matrices are K for the training view, an K 1 for the test image; the worl coorinate is efine as camera s axis, i.e., camera has no rotation an zero translation. The planar object is place at istant with normal irection n. The extrinsic matrix for camera 1 is [ R t ]. The gravity vector g is provie in the camera 1 s coorinates uring testing. The goal is to fin the projection of the surface normal n in camera 1 s coorinates, an then compute its angle with the gravity vector g. To o this, we must fin the rotation matrix R from camera to camera 1, espite the challenge that camera has unknown internal parameters. H = K 1 1 HK = R tnt. (2) Unfortunately, we cannot irectly ecompose H into R an t to get the position of the target ([8] an [1]), because although we have the camera parameter K 1 for the testing view (camera 1), the training view s intrinsics, K, are unknown. 2.2. Depth an intrinsic matrix K With unknown intrinsic matrix K, we make use of our assumptions about the training images 1) they are frontal views of a planar object; 2) the camera has zero-skew; 3) it also has square pixels. These assumptions hol for most planar targets, such as paintings, logos, book or CD covers online. Then K becomes: K = f c x f c y. (3) 1 For any 3D point X that lies on the plane [n T, ] T, since the camera is taking the frontal view of the plane object, then X = [x, y,, 1] T. By efinition, camera has ientity rotation an zero translation, then the extrinsic matrix for camera is [ I 3 3 ]. In homogeneous coorinates, the 2D projection x of 3D point X to camera is : x = K [ I3 3 = 1 c x 1 c y 1 We can rewrite K as K : 1 c x K = 1 c y = 1 ] [x, y,, 1] T x y f [ c (4) ], (5) where c = [c x, c y, 1] T, an represent the epth as : = f. (6) For a frontal view of a planar object, the epth an the focal length are relate: we get the exact same image results

if we increase the focal length of the camera, an move the object further away. With this erivation, the equation for the homography H in ( 2 )becomes : f tnt 1, (7) H = K1 H c =R 2.3. Decompose H to R Fig. 3. Sample frontal views of our planar object images teste. To ecompose H, we efine the vectors u an v as: 1 u =, v = 1, (8) an multiply them by H in ( 7). For u, on the left sie of ( 7) we have: 1 H u = K1 1 H c = K1 1 Hu. (9) On the right sie, since nt u =, (remember, in camera the frontal view of the planar object has n = [,, 1]T ) we have: f tnt )u = Ru (1) (R Therefore by combining ( 9) an ( 1) we have K1 1 Hu = Ru. (11) Following the same erivation for vector v, we also have K1 1 Hv = Rv. (b) θgt =, 3 respectively Fig. 4. Experiment setting with (a) fixe object orientation, an ifferent camera positions, an (b) graually increasing object orientation from to 75. to the groun, we compute the projection of plane s normal vector n in camera 1 s projection, i.e., Rn, an calculate its angle to the gravity vector g. This gives the orientation θ. More formally, efine angle α as by applying Eq. 13: α = arccos(rn g), α, α π2. θ= π α, π2 < α < π (14) (15) (12) Since R is a rotation matrix an [u, v, u v] = I3 3, thus [Ru, Rv, (Ru) (Rv)] = R[u, v, u v] = R = [K1 1 Hu, K1 1 Hv, (K1 1 Hu) (K1 1 Hv)]. (a) θgt = 72.5 (13) In practice, ue to the noise in computing H an K1, the final rotation matrix R may not be orthogonal. We approximate R to the nearest orthogonal matrix R by the Frobenius norm, i.e., by taking SVD of the Eq. 13, R = U ΣV T, an then the final R = U V T. Eq. 13 shows that, uner our assumptions for the training images (frontal view, no istortion), the rotation matrix R is inepenent of the camera center c an the focal length f of camera, the epth an the translation t. Once we retrieve the homography H between the two views an know the intrinsic parameter K1 for the testing camera 1, we can etermine the rotation matrix R from camera to camera 1. 2.4. Planar object orientation θ During testing, the gravity irection g is in camera 1 s coorinates. To compute the planar object orientation θ with respect 3. EXPERIMENTS An iphone 4 is use for the experiments. The camera is calibrate to fin K1, an the inertial sensor ata is recore while capturing images. We assume the coorinates of the gravity vector are aligne with the camera axis, an irectly use the gravity vector from the inertial sensor as g. We use several ifferent types of planar images, all ownloae, for experiments. We choose five famous paintings (Mona Lisa, The Girl with a Pearl Earring, Marriage of Arnolfini, The Music Lesson an Birth of Venus), a logo (Starbucks), an a CD cover (Beatles). We rener the image on an ipa as the testing target, an capture the scene with the calibrate iphone 4 camera to estimate the absolute orientation of the ipa. The groun truth angle of the planar object is manually measure by a protractor. The homography is compute through SIFT point matching [11] an RANSAC algorithm with DLT [6]. Three types of experiments are performe (see Fig. 4 for the first two): 1) we keep the planar object at a fixe angle with respect to the groun, an take images with ifferent camera positions; 2) we keep the camera at a fix angle, an

gt 15 3 obj1-mean 4.6 17.1 29.8 obj1-var 2.4 2.7 3. obj2-mean 7.7 13.8 29. obj2-var 3.5 4.4 4.1 45 44.3 3.2 45. 2.4 6 61.2 4.3 56.8 2.1 75 76.8 3.5 72.2 1.8 Table 1. Results when having one object rotating from to 75. gt is the groun-truth angle. obj1 is the Starbuck logo, an obj2 is the Beatles CD cover. The table shows the mean an variance (var) for each image by sampling homographies with RANSAC. Fig. 5. Experiment result of ifferent object orientations. graually rotate the planar object from to 75 with respect to the groun; 3) we capture images in the real-worl to show some qualitative results. Fixe object, ifferent camera positions: In this setting we keep the object at a fixe angle, an take images from ifferent position. One example is shown in Fig. 4 (a). We take 1 images for each planar object at a certain angle. For each image, we compute homography 3 times using RANSAC, an compute the angle θ through each homography. The experiment results are shown in Fig. 5. The ashe lines inicates the groun-truth angle. Different colors inicate ifferent images at a specific angle. Because RANSAC prouces a ifferent homography each time, the error bar shows the stanar eviation of θ. Our propose algorithm preicts the angle of the planar object with small errors, that may be introuce by misalignment of the gravity vector axes, calibration of K1, an especially by the quality of homography. One example is shown in Fig. 6. A goo homography gives much smaller error in estimating the planar object s orientation. Fixe camera, ifferent object orientations: In another test we keep the camera roughly fixe, but graually increase the orientation angle θ of the planar object from to 75. One example is shown in Fig. 4 (b). The results are shown in Table 1. Our propose algorithm still robustly estimates the object orientation in this case. For these two scenarios, overall 88% of the testing cases are within 5 from the groun truth, an 98% of them are within 1 from the groun truth. Real worl example: We also test on real-worl images with a Starbucks logo for qualitative results, shown in Fig. 7. Our propose algorithm preicts the logo orientation θ accurately in ifferent situations. 4. CONCLUSION This paper estimates the absolute orientation of a planar object using the gravity sensor with a mobile camera. We mae (a) θgt = 42.5 (b) θ = 43. (c) θ = 55.1 Fig. 6. Homography quality affects the orientation θ. The groun-truth orientation (in (a)) is 42.5. A goo homography in (b) has a better preiction than a worse one in (c). weak assumptions about the training image, e.g., frontal view, zero skew an no istortion. During testing we estimate the rotation through a special homography ecomposition, an calculate the projection of the normal vector of the planar object, an its angle between the gravity vector as the object orientation. Experiments showe that our propose algorithm robustly preicts the object orientation in ifferent scenarios. Future applications can be achieve base on this algorithm, e.g., correcting the homography or improving object etection. Since we fin the rotation matrix of the testing camera, our algorithm can also estimate the in-plane rotation of the planar object. This can be use for other applications, e.g. to preict if the object is up-sie own. (a) θ = 87.5 (b) θ = 3.7 (c) θ = 6.6 Fig. 7. Preicting the angle of Starbucks logo in real-worl images: a) A Starbucks logo outsie the Cafe. The grountruth θgt = 9. b) A bag of coffee in han. θgt shoul be in between of an 9. c) A menu on the table. θgt =. θ uner each figure is the preiction from our algorithm.

5. REFERENCES [1] D. Hoiem, A. A. Efros, an M. Hebert, Closing the loop in scene interpretation, in CVPR, 28. [2] D. Hoiem, A. A. Efros, an M. Hebert, Putting objects in perspective, IJCV, vol. 8, 28. [3] D Kurz an S Himane, Inertial sensor-aligne visual feature escriptors, in CVPR, 211. [4] H Lee, E Shechtman, J Wang, an S Lee, Automatic upright ajustment of photographs, in CVPR, 212. [5] J. Lobo an J. Dias, Vision an inertial sensor cooperation using gravity as a vertical reference, PAMI, vol. 25, no. 12, 23. [6] A. Harltey an A. Zisserman, Multiple view geometry in computer vision (2. e.), Cambrige University Press, 26. [7] Y. Ma, S. Soatto, J. Kosecka, an S. Sastry, An Introuction to 3D Vision, Springer, 23. [8] O. D. Faugeras an F. Lustman, Motion an structure from motion in a piecewise planar environment, International Journal of Pattern Recognition an Artificial Intelligence, 1988. [9] Z. Zhang an A.R. Hanson, 3 reconstruction base on homography mapping, Proc. ARPA96, 1996. [1] E. Malis an M. Vargas, Deeper unerstaning of the homography ecomposition for vision-base control, 28. [11] D. G. Lowe, Distinctive image features from scaleinvariant keypoints, IJCV, 24.