Bayesian perspective-plane (BPP) with maximum likelihood searching for visual localization

Similar documents
calibrated coordinates Linear transformation pixel coordinates

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection

METRIC PLANE RECTIFICATION USING SYMMETRIC VANISHING POINTS

Cell-based visual surveillance with active cameras for 3D human gaze computation

Structure from Motion. Prof. Marco Marcon

Flexible Calibration of a Portable Structured Light System through Surface Plane

Projective geometry for Computer Vision

A Stratified Approach for Camera Calibration Using Spheres

Computer Vision I - Algorithms and Applications: Multi-View 3D reconstruction

Feature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies

Stereo Vision. MAN-522 Computer Vision

Multiple Views Geometry

A Factorization Method for Structure from Planar Motion

3D Reconstruction from Scene Knowledge

Invariance of l and the Conic Dual to Circular Points C

Camera model and multiple view geometry

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION

A Survey of Light Source Detection Methods

Mathematics of a Multiple Omni-Directional System

Chapter 3 Image Registration. Chapter 3 Image Registration

BIL Computer Vision Apr 16, 2014

Computer Vision Projective Geometry and Calibration. Pinhole cameras

Projector Calibration for Pattern Projection Systems

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration

Multiple View Geometry

Outdoor Scene Reconstruction from Multiple Image Sequences Captured by a Hand-held Video Camera

ECE 470: Homework 5. Due Tuesday, October 27 in Seth Hutchinson. Luke A. Wendt

arxiv: v1 [cs.cv] 28 Sep 2018

cse 252c Fall 2004 Project Report: A Model of Perpendicular Texture for Determining Surface Geometry

Coplanar circles, quasi-affine invariance and calibration

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 253

Computer Vision Lecture 17

Segmentation and Tracking of Partial Planar Templates

Computer Vision Lecture 17

Machine vision. Summary # 11: Stereo vision and epipolar geometry. u l = λx. v l = λy

A simple method for interactive 3D reconstruction and camera calibration from a single view

1 (5 max) 2 (10 max) 3 (20 max) 4 (30 max) 5 (10 max) 6 (15 extra max) total (75 max + 15 extra)

Midterm Exam Solutions

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

Multiple View Geometry in Computer Vision Second Edition

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

Computer Vision I - Appearance-based Matching and Projective Geometry

Compositing a bird's eye view mosaic

Multiple View Geometry in computer vision

Camera Registration in a 3D City Model. Min Ding CS294-6 Final Presentation Dec 13, 2006

MAPI Computer Vision. Multiple View Geometry

Chapters 1-4: Summary

Metric Rectification for Perspective Images of Planes

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 263

EXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006,

Adaptive Zoom Distance Measuring System of Camera Based on the Ranging of Binocular Vision

Computer Vision I - Appearance-based Matching and Projective Geometry

Camera Calibration from the Quasi-affine Invariance of Two Parallel Circles

Stereo imaging ideal geometry

Stereo and Epipolar geometry

Partial Calibration and Mirror Shape Recovery for Non-Central Catadioptric Systems

Computer Vision I Name : CSE 252A, Fall 2012 Student ID : David Kriegman Assignment #1. (Due date: 10/23/2012) x P. = z

A Robust Two Feature Points Based Depth Estimation Method 1)

CS-9645 Introduction to Computer Vision Techniques Winter 2019

Massachusetts Institute of Technology Department of Computer Science and Electrical Engineering 6.801/6.866 Machine Vision QUIZ II

Visual Tracking of Unknown Moving Object by Adaptive Binocular Visual Servoing

Dense 3D Reconstruction. Christiano Gava

1 Projective Geometry

Hand-Eye Calibration from Image Derivatives

A 3D Pattern for Post Estimation for Object Capture

Behavior Learning for a Mobile Robot with Omnidirectional Vision Enhanced by an Active Zoom Mechanism

Structure from motion

Stereo II CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

CS223b Midterm Exam, Computer Vision. Monday February 25th, Winter 2008, Prof. Jana Kosecka

FAST REGISTRATION OF TERRESTRIAL LIDAR POINT CLOUD AND SEQUENCE IMAGES

Final Exam Study Guide

CIS 580, Machine Perception, Spring 2015 Homework 1 Due: :59AM

High Accuracy Depth Measurement using Multi-view Stereo

Unit 3 Multiple View Geometry

DEPTH AND GEOMETRY FROM A SINGLE 2D IMAGE USING TRIANGULATION

CS231A Midterm Review. Friday 5/6/2016

3D reconstruction class 11

Introduction to Computer Vision

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Dense 3-D Reconstruction of an Outdoor Scene by Hundreds-baseline Stereo Using a Hand-held Video Camera

Camera Pose Measurement from 2D-3D Correspondences of Three Z Shaped Lines

Visual Recognition: Image Formation

MERGING POINT CLOUDS FROM MULTIPLE KINECTS. Nishant Rai 13th July, 2016 CARIS Lab University of British Columbia

EECS 442: Final Project

Computer Vision. Coordinates. Prof. Flávio Cardeal DECOM / CEFET- MG.

3D Reconstruction from Two Views

Light source estimation using feature points from specular highlights and cast shadows

Automatic Feature Extraction of Pose-measuring System Based on Geometric Invariants

Camera models and calibration

Multiple View Geometry

Epipolar Geometry and Stereo Vision

CSE 252B: Computer Vision II

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman

Simultaneous Vanishing Point Detection and Camera Calibration from Single Images

Model-Based Stereo. Chapter Motivation. The modeling system described in Chapter 5 allows the user to create a basic model of a

Measurement and Precision Analysis of Exterior Orientation Element Based on Landmark Point Auxiliary Orientation

Factorization Method Using Interpolated Feature Tracking via Projective Geometry

Depth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth

Structured Light. Tobias Nöll Thanks to Marc Pollefeys, David Nister and David Lowe

Vehicle Dimensions Estimation Scheme Using AAM on Stereoscopic Video

Transcription:

DOI 1.17/s1142-14-2134-8 Bayesian perspective-plane (BPP) with maximum likelihood searching for visual localization Zhaozheng Hu & Takashi Matsuyama Received: 8 November 213 /Revised: 15 April 214 /Accepted: 26 May 214 # Springer Science+Business Media New York 214 Abstract The proposed Perspective-Plane in this paper is similar to the well-known Perspective-n-Point (PnP) or Perspective-n-Line (PnL) problems in computer vision. However, it has broader applications and potentials, because planar scenes are more widely available than control points or lines in daily life. We address this problem in the Bayesian framework and propose the Bayesian Perspective-Plane (BPP) algorithm, which can deal with more generalized constraints rather than type-specific ones. The BPP algorithm consists of three steps: 1) plane normal computation by maximum likelihood searching from Bayesian formulation; 2) plane distance computation; and 3) visual localization. In the first step, computation of the plane normal is formulated within the Bayesian framework, and is solved by using the proposed Maximum Likelihood Searching Model (MLS-M). Two searching modes of 2D and 1D are discussed. MLS-M can incorporate generalized planar and out-ofplane deterministic constraints. With the computed normal, the plane distance is recovered from a reference length or distance. The positions of the object or the camera can be determined afterwards. Extensions of the proposed BPP algorithm to deal with un-calibrated images and for camera calibration are discussed. The BPP algorithm has been tested with both simulation and real image data. In the experiments, the algorithm was applied to recover planar structure and localize objects by using different types of constraints. The 2D and 1D searching modes were illustrated for plane normal computation. The results demonstrate that the algorithm is accurate and generalized for object localization. Extensions of the proposed model for camera calibration were also illustrated in the experiment. The potential of the proposed algorithm was further demonstrated to solve the classic Perspective-Three-Point (P3P) problem and classify the solutions in the experiment. The proposed BPP algorithm suggests a practical and effective approach for visual localization. Keywords Visual localization. Bayesian Perspective-Plane (BPP). Generalized constraints. Maximum likelihood searching, Perspective-Three-Point (P3P) Z. Hu (*) ITS Research Center, Wuhan University of Technology, Wuhan 4363, People Republic of China e-mail: zhaozheng.hu@gmail.com Z. Hu: T. Matsuyama Graduate School of Informatics, Kyoto University, Kyoto 66-851, Japan

1 Introduction Visual localization has important applications in computer vision, virtual/augmented reality (VR/AR), multimedia modeling, human-computer interface (HCI), motion estimation, navigation, and photogrammetric communities, etc. This topic has been intensively investigated for the past decades [2,4,5,8,9,12]. The basic objective is to compute the position of an object or the camera in a reference coordinate system by using either multiple (two or more) or single image (s). For example, stereo vision is a commonly used approach for localization from two images, which can retrieve the 3D coordinates or position of an object in the scene from image disparity, baseline width, and camera focal length [8]. Compared to the multi-view methods, localization from a single view usually relies on some prior knowledge of the scene. Based on different scene constraints, there are many algorithms developed for visual localization. However, these algorithms are usually type-specific ones and not generalized enough. In the literatures, there are many localization methods that try to exploit different scene constraints. For example, model-based localization is a very popular localization approach, which tries to exploit prior knowledge of a well-known model to estimate the depth, distance, or relative pose between object and camera. Actually, the model can be represented in different forms, such as a complete 3D map, e.g., a CAD model of the environment, a set of points or lines, which have known coordinates in a reference coordinate system (also known as control points or control lines), a well-defined landmark including structured light patterns, a natural or artificial object, a specific shape, etc. The researches on localization from a set of control points or lines have been formulated as the well-known Perspective-n-Point (PnP) and Perspective-n-Line (PnL) problems in the field of computer vision [6,1,11,15,18,25]. For the past decades, a lot of algorithms and computation models have been successfully developed to address the P3P (n=3), P4P (n=4), P5P (n=5), and P3L (n=3), P4L (n=4), P5L (n=5) problems. The PnP and PnL researches try to address two problems: 1) how to solve the PnP problem; 2) how to classify the multiple solutions. As for n 6 cases, PnP and PnL problems have unique solutions. Existing approaches, such as Direct Linear Transform (DLT) [8], have been successfully developed to address these problems. Landmark-based algorithms are another category for visual localization, which have been intensively investigated, especially for mobile robot navigation. For the purpose of easy recognition and position computation, landmarks are usually professionally designed with good features, such as dominant colors, textures, etc., and pre-defined shapes [1,29]. Moreover, some natural or artificial objects in the scene, such as human body, face, eye, window in the street, etc., can also be used for localization. For example, we can estimate the relative position between a camera and a human face from a single image, by using a calibrated 3-D face model [21], which has a lot of application for human-computer interface (HCI). A single image of human eye balls can help to localize the eyes and compute the gaze directions, which have important applications for human attention analysis [19]. Zhu and Ramanan investigated the localization problem from landmark in the wild [3]. Camera pose can be computed from 3D corner, which is widely available in artificial buildings [2]. Urban building and roads can provide rich information for visual localization of mobile devices, which has important applications for navigation [27]. Symmetry is also important clue for localization [23]. More generally, if a planar structure is determined, it is possible to localize object and camera in between. Actually, planar scenes are commonly found in daily life. Hence, localization based on planar structure is very practical and flexible. In existing algorithms, planar structure, such as the plane normal or the distance, can be computed from circular points, parallel lines, homography, conics, control points, deterministic or statistical texture, etc. For example, Hu et al. determine the support plane from the geometric constraints of three control points [1]. If there are four

control points or more on the plane, we can compute the homography and determine the position of the plane by decomposition [28]. It is also feasible to compute the plane normal from vanishing line, if two sets of parallel lines are available in the scene [13]. Planar conics, such as circles, give other important clues for plane computation, which can be interpreted by the circular points [26]. Planar structure can be calculated from video sequences using specific constraints [7]. Pretto et al. investigated the 3D localization of planar objects for industry binpicking [17]. Textures were investigated for plane shape recovery, which usually assume prior knowledge of statistical characteristics, such as homogeneity, or repetition of specific shapes [24]. From the literatures, it can be found that existing single view algorithms for localization or planar structure computation algorithms are mostly based on some specific scene constraints. They are not generalized or practical enough because of two reasons. On the one hand, some strict scene constraints, which are required by the type-specific algorithm, are sometimes difficult to satisfy in the scene. On the other hand, some constraints, which are available in the scene, are difficulty to be utilized by the type-specific algorithm. As a result, these algorithms have limitations in practice, especially for the occlusion and partially visible situations. For example, the rectangle based method is not feasible if the one edge of the rectangle target is occluded. However, with the proposed method, we can still make use of the three angles between the three remaining edges for visual localization, as the proposed method can exploit generalized scene constraints rather than specific shapes. Hence, the motivation of this paper is to propose a new model that could incorporate generalized scene constraints rather than type-specific ones for visual localization. The proposed is expected to replace or complement existing localization methods by proposing a unified framework to cope with different constraints. In this paper, we propose a novel localization problem of Perspective-Plane to address the issues arisen in the constraint-specific methods in the literatures. Perspective-Plane is similar to PnP or PnL problems in computer vision, but with broader applications and potentials, since planar scenes are more widely available in daily life. The Perspective-Plane problem deals with the planar object instead of particular control points or lines cases for visual localization. And we propose Bayesian Perspective-Plane (BPP) algorithm to solve the Perspective- Plane problem. The BPP algorithm can incorporate both planar and out-of-plane deterministic constraints to compute the planar structure and localize the object. By using the Bayesian formula, determination of the planar structure is formulated as a maximum likelihood problem, which is solved by the maximum likelihood searching model (MLS-M) on Gaussian hemisphere. We also present 1D searching mode to simplify and enhance the searching. Localization is accomplished from the recovered planar structure afterwards. The proposed model can also be extended to deal with un-calibrated images and for camera calibration. Since planar scenes are commonly found in daily life, the proposed BPP algorithm is expected to be practical and flexible in many computer vision applications. The contributions of this paper are summarized as follows: 1) propose the conception of Perspective-Plane, and address it within the Bayesian framework by proposing the Bayesian Perspective-Plane (BPP) algorithm. The BPP algorithm can deal with more generalized constraints rather than specific ones for visual localization, compared to the traditional PnP and PnL problems; 2) model the likelihood with normalized Gaussian functions and proposed Maximum Likelihood Searching Model (MLS-M) to solve for the plane normal by searching on Gaussian hemisphere. Moreover, we proposed two searching modes of MLS-M, 1D and 2D searching modes, for efficient computation; 3) extend the proposed BPP method to uncalibrated images and camera calibration. The structure of the paper is organized as follows. Section 2 introduces Perspective-Plane geometry and the formulation of planar and out-of-plane geometric constraints from scene prior

knowledge; Section 3 proposes the algorithm of Bayesian Perspective-Plane (BPP) by using generalized constraints and the extensions of the model; Section 4 presents the experimental results with both simulation and real image data. The conclusions are drawn in the final section. 2 Formulation of geometric constraints 2.1 Geometry from a reference plane Under a pin-hole camera model, a physical plane and its image are mapped by a 3 3 homography [1] x HX where X is for the coordinates of the point on the physical plane, x is for the image coordinates, and means equal up to a scale. Given the camera calibration matrix and the plane structure, i.e., the plane normal and distance, there are two approaches to compute the homography. One is based on the stratified reconstruction by determining the vanishing line and the circular points [16]. The other is to assume a world coordinate system (WCS) from the physical plane [28]. For example, let the X-O-Y plane in WCS coincides with the physical plane, we try to derive two unit orthogonal directions N 1 and N 2, as the directions of x- and y- axes, from plane normal (i, j=1, 2 and i j) 8 < N T N i ¼ N T i N j ¼ ð2þ : N T i N i ¼ 1 The third constraint in Eq. (2) forces unit N 1 and N 2. Actually, we cannot uniquely determine N 1 and N 2 from Eq. (2). In practice, we can set N 1 as ð1þ N 1 ½N z N x Š T ð3þ where N ¼ ½N x N y N z Š T. With N 1, the direction N 2 is computed from Eq. (2)readily.As a result, we can compute the homography as [28] H ¼ ½KN 1 KN 2 Kt Š ð4þ where K is the calibration matrix, t is the translation vector between the WCS and camera coordinate system, and Kt is the image of the origin of the WCS. With the homography, we can compute the coordinates X on the physical plane from its image point x as X H 1 x. Hence, a Euclidean reconstruction of the plane is possible. And more geometric attributes, such as distance, angle, length ratio, curvature, shape area, etc., are calculated readily. Note that a metric reconstruction of the plane is also feasible with known plane normal and unknown distance. However, all the absolute geometric attributes, such as coordinates, distance, etc., are computed up to a common scale to the actual ones. And some non-absolute geometric attributes, such as length ratio, angle, etc., are equal to the actual ones. This is because the metric reconstruction is up to a similarity transform with the Euclidean one. It is also feasible to compute the geometric attributes associated with some specific out-ofplane features by referring to a known plane. For example, Wang et al. show that an orthogonal plane can be measured [22], if a reference plane is well determined. Criminisi et al. show that if the vanishing line of a reference plane and one vanishing point along a reference direction are

known, object height, distance, length ratio, angle, and shape area on the parallel planes etc., are readily computed [3]. From projective geometry, both the vanishing line of the plane and the vanishing point along the vertical direction can be computed from the plane normal and the camera calibration matrix. Hence, the out-of-plane geometric attributes, as discussed above, can be computed from a determined planar structure and camera calibration matrix. 2.2 Geometric constraint formulation Prior knowledge of the scenes can generate geometric constraints on the planar structure. We classify the scene constraints into two categories: planar and out-of-plane constraints. Planar constraints are derived from prior knowledge of the geometric attributes of planar features on the plane, while the out-of-plane constraints are from the out-of-plane features. The planar features are those that lie on the plane, such as the point, line, angle, curve, shape, etc. (see Fig. 1 (a)), while the other are defined as out-of-plane features, such as an orthogonal line, an angle on parallel plane, etc. (see Fig. 1 (b)). Formulation of both planar and out-of-plane constraints is introduced as follow. Assume a calibrated camera so that the computation of the geometric attributes relies on the planar structure only. Actually, the plane normal can determine the plane up to a scale, which allows the computation of a lot of non-absolute geometric attributes, such as the relative length, angle, etc., without knowing the plane distance. Hence, we only consider the non-absolute geometric attributes to calculate the plane normal. The plane distance is computed readily from reference length, once the plane normal is determined, as will be discussed in section 3.4. Let Q i (N) be the geometric attribute computed from a given plane normal N. Letu i be the deterministic value associated with such geometric attribute that we know a prior. The difference is defined as the measurement error associated with the normal N d i ðnþ ¼ Q i ðnþ u i ð5þ By forcing zero measurement error, we can derive one constraint C i as C i :d i ðnþ ¼ Q i ðnþ u i ¼ ð6þ Note that Eq. (6) holds for both planar and out-of-plane constraints. Similarly, we can derive a set of geometric constraints on plane normal n o di C ¼ C i ðnþ ¼ Q i ðnþ u i ¼ ði ¼ 1; 2; MÞ ð7þ (a) (b) Fig. 1 a) planar features, e.g., control point, control line, known angle, specific shape, etc., on the plane; b) Outof-plane features that lie out of the plane, e.g., orthogonal line, known angle on a parallel plane, point on an orthogonal plane, etc

The plane normal can be computed by solving Eq. (7), which is, however, not an easy task. Existing algorithms are usually based on some specific constraints to solve Eq. (7), and not generalized or practical enough. A generalized algorithm, Bayesian Perspective-Plane (BPP) is discussed as follows. 3 Bayesian perspective-plane (BPP) 3.1 Localization from planar structure Localization is possible from single image of a planar structure, i.e., with known plane normal and distance, as illustrated in Fig. 2. The point on the plane can be localized by calculating the intersection of the back-projection ray and the plane as follows X ¼ λk 1 x X T ð8þ N ¼ d where the first equation in Eq. (8) defines the back-projection ray by using the camera calibration matrix and the image (see the dash line in Fig. 2), while the second one is the equation for a known plane (N, d). In a similar way, we can localize all the points on the plane. If we use the plane to expand a reference coordinate system, the position of the camera can be computed as well [28]. Hence, we can localize the object and camera in-between. And the key is to determine the plane normal and distance. 3.2 Formulation of Bayesian perspective-plane (BPP) We assume the plane normal N with certain distributions in the searching space. For example, if no geometric constraints are given, N is uniformly distributed, as shown in Fig. 3 (a). Given one geometric constraint, the distribution of the normal (P(N C i )) is updated from uniform to non-uniform, as shown in Fig. 3 (b). Given a set of geometric constraints, the plane normal has a new distribution (P(N C 1,C 2, C M )) with a dominant peak (if we have sufficient constraints). And the plane normal with the highest probability Fig. 2 Localize a 3D point X in the camera coordinate system from single view of a known plane by computing the intersection with the back-projection ray

(a) (b) (c) Fig. 3 Distributions of plane normal in different conditions: 1) Uniform distribution (P(N)) with no constraints; 2) Non-uniform distribution (P(N C i )) from one geometric constraint; 3) Distribution (P(N C 1,C 2, C M )) with a dominant peak from a set of geometric constraints gives the solution (see Fig. 3 (c)). Note that we use probability instead of probability dense function model (P.D.F.) throughout the paper, because we need to partition Gaussian hemisphere to discrete the space for maximum likelihood searching. Based on the above consideration, computation of the plane normal from a set of geometric constraints is formulated as to maximize the following conditional probability N ¼ arg max P N C1 ; C 2 ; C M ð9þ N However, it is difficult to solve Eq. (9) directly. By using Bayesian formula, we can transform Eq. (9) and derive the following equation N P C 1 ; C 2 ; C M PN ð Þ P NC 1 ; C 2 ; C M ¼ ð1þ PC ð 1 ; C 2 ; C M Þ where P(C 1,C 2, C M N) is the likelihood. P(N)andP(C 1,C 2, C M ) are the prior probabilities for the plane normal and geometric constraints, respectively. Assume that the constraint C i (i= 1,2, M) is conditionally independent to each other, so that PC ð 1 ; C 2 C M jn M Þ ¼ i¼1 PC ð i jnþ ð11þ Assume that the plane normal is uniform distribution so that P(N)/P(C 1,C 2, C M )isa constant. We can substitute Eq. (11) into Eq.(1) to derive the relative probability N P NC 1 ; C 2 ; C M P C 1 ; C 2 ; C M Substituting Eq. (12) into Eq.(9) yields N ¼ arg max N M ¼ i¼1 M P NC 1 ; C 2 ; C M arg max N i¼1 N P C i N P C i Therefore, to solve Eq. (13) is equivalent to compute the normal, which yields the maximum joint likelihood. In order to solve Eq. (13), we need to model P(C i N),which is defined as the likelihood that the i th constraint is satisfied, given ð12þ ð13þ

the plane normal N. In order to develop a reasonable and accurate model, we develop the following rules 1) The likelihood is determined by the measurement error. More the absolute measurement error yields lower the likelihood is. The maximum likelihood is reached when the measurement error is zero; 2) The maximum likelihoods (for zero measurement error) for different constraints should be equal so that all constraints contribute equally to solve Eq. (13); 3) The measurement error should be normalized to deal with the geometric attributes with different forms, units, and scales, etc. To serve these purposes, we proposed the normalized Gaussian function to model the likelihood as! G d i ðnþu i ; σ i ¼ pffiffiffiffiffi 1 ð exp Q iðnþ u i Þ 2 ¼ 1 pffiffiffiffiffi exp d2 i ðnþ ð14þ 2π σi 2π σi 2u 2 i σ2 i 2u 2 i σ2 i where u i from Eq. (5) is used to normalize the measurement error. Note that Eq. (14) expects non-zero u i. Otherwise, the measure error is not normalized. It can be observed from Eq. (14) that the first rule is well satisfied. The likelihood decreases with the absolute measurement error. And the maximum likelihood is reached for zero measurement error. In order to satisfy the second rule, the standard deviations for all Gaussian models should be equal (σ i =σ j =σ). In practice, we can choose appropriate σ. As a result, the likelihood model for each constraint is derived as N P C i ¼ G d i ðnþu i ; σ exp d2 i ðnþ ð15þ 2u 2 i σ2 Once we model the likelihood function, we can re-organize Eq. (9) by substituting Eq. (15) into Eq. (12) and derive the following equation PNjC ð 1 ; C 2 ; C M M Þ i¼1 M Gd ð i ðnþþ i¼1 exp d2 i ðnþ 2u 2 i σ2 Finally, we can derive the following equation to compute the plane normal! M N ¼ arg max PNjC ð 1 ; C 2 ; C M Þ arg max exp X d 2 i ðnþ N N 2u 2 i¼1 i σ2 ð16þ ð17þ 3.3 Maximum likelihood searching model (MLS-M) A maximum likelihood searching model (MLS-M) is proposed to solve Eq. (17). And two searching modes of 2D and 1D are proposed in the following paragraph. a. 2D searching mode A unit plane normal corresponds to a point on the Gaussian sphere surface. Hence, the Gaussian sphere defines the searching space. In practice, we search on Gaussian hemisphere to reduce the searching space into half, because only the planes in front of the camera are visible

in the image. Once we define the searching space, we need to partition the space for plane normal sampling. A uniform sampling can be realized by evenly partition Gaussian hemisphere into a number of cells. Some algorithms are developed to serve this purpose. And in this paper, we used the recursive zonal equal area sphere partitioning algorithm in [14]. After partition, the center of each cell represents a sampled normal. The likelihood for each sampled normal is computed by using Eq. (17). Once the likelihoods for all the samples are calculated, the maximum likelihood is computed by sorting. The corresponding plane normal is the solution. Since Gaussian sphere in 3D space is two dimensional (2D), it is called the 2D searching mode. b. 1D searching mode A linear constraint can reduce 2D searching into 1D to enhance the searching efficiency. In practice, some linear constraints are widely available in the scene, such as parallel lines, orthogonal lines or planes, etc. For example, a linear constraint on the plane normal can be formulated from an orthogonal line. Let L be the orthogonal line, with its image l. Because L is parallel to the plane normal, the vanishing point V normal along the normal direction lies on l and satisfies the following equation l T V normal ¼ l T ðknþ ¼ K T TN l ¼ ð18þ The term K T l represents a re-projection plane passing through the camera center and the image l, on which the normal vector lies. Since N also lies on the Gaussian sphere, we can derive ( K T TN l ¼ N T ð19þ N ¼ 1 Eq. (19) above defines a circle in 3D space, which is the intersection between the plane and the Gaussian sphere (see Fig. 4). We can thus search on a 1D circle (actually half of the circle) instead of 2D Gaussian hemisphere. A uniform sampling is implemented to discrete the space (see APPENDIX A). Similarly, we compute the likelihood for each sample normal and search for the maximum likelihood to compute the plane normal. Other linear constraints can also help reduce 2D into Fig. 4 1Dsearchingonaunitcircle in 3D space, which is the intersection of Gaussian sphere and a plane

1D searching by following the similar steps, such as a vanishing point on the plane, an orthogonal plane, etc. 3.4 Plane distance determination and localization Assume unit plane distance and re-write Eq. (8) as follows X ¼ λk 1 x X T ð2þ N ¼ 1 By comparing Eq. (8) and Eq. (2), the computed coordinates are equal to the actualonesuptoascale:x X. As a result, we can compute the 3D coordinates for all the points on the plane, and localize them in the camera coordinate system, all up to a common scale. In order to get the actual coordinates and distances, we need a reference length or distance. Assume the actual length L AB between two points A,B is known, the scale factor can be determined by L AB β ¼ X A X B ð21þ where X A and X B are the 3D coordinates of A,B from Eq. (2) by assuming unit plane distance and X A X B is the distance. Once the scale factor is determined, the actual plane distance is d=β. We can compute the actual coordinate for localization as X ¼ βx ð22þ 3.5 Extensions of the BPP algorithm So far we assume calibrated camera so that the geometry computation relies on the planar structure only. However, the BPP algorithm can be extended to deal with un-calibrated images by assuming minimum camera calibration, i.e., with zero skew, unit aspect ratio, central principle point. The geometry computation is determined by both the plane normal and the focal length. Hence, the measurement error is given by d i ðn; f Þ ¼ Q i ðn; f Þ u i ð23þ As a result, we re-formulate Eq. (9) by incorporating both plane normal and focal length: ðn ; f Þ ¼ arg max P N; f C 1 ; C 2 ; C M ð24þ N; f The focal length is assumed independence to the normal with uniform distribution. We can use the Bayesian formula to transform Eq. (24) by applying the Gaussian likelihood model as follows ðn ; f Þ ¼ arg max P N; f C 1 ; C 2 ; C M arg max N; f N; f exp X! M d 2 i ðn; f Þ 2u 2 ð25þ i¼1 i σ2 In order to compute the maximum likelihood, we need to search for both the plane normal and the focal length. Hence, a 3D searching is required. However, as we discussed in the above

paragraphs, if there is one linear constraint on the plane normal available, the 3D searching can be reduced into 2D. A further extension of the proposed model is for camera calibration. We start from the assumption that the plane vanishing line is known, the same assumption as used in [7,13]. We can compute the plane normal from the vanishing line, given the camera focal length, and compute the geometric attributes. As a result, we formulate the camera calibration as to maximize the following conditional probability f ¼ arg max P fc 1 ; C 2 ; C M ð26þ f Once again, we can apply the Gaussian likelihood model and derive the following equation for camera calibration (focal length computation) f ¼ arg max P fc 1 ; C 2 ; C M arg max exp X! M d 2 i ðf Þ f f 2u 2 ð27þ i¼1 i σ2 where the measurement error for the corresponding focal length is given by d i ðf Þ ¼ Q i ðf Þ u i ¼ ð28þ Note that Eq. (27) can also incorporate both planar and out-of-plane constraints for camera calibration, and is more generalized than the algorithms proposed in [7,16]. The proposed camera calibration method is illustrated in the following experiment. 4 Experimental results The proposed BPP algorithm was tested with both simulation and real image data by using planar and out-of-plane constraints. Extension of the proposed model for camera calibration was also tested. Furthermore, the proposed model was applied to solve the well-known Perspective-Three-Point (P3P) problem in the real image test. Although the proposed algorithm is designed to deal with generalized deterministic constraints, it is not feasible to discuss all types of constraints in the experiments. Without loss of generality, we chose two types of constraints: 1) known length ratio; 2) known angle, which are also two fundamental geometric attributes in measurement. 4.1 Results with simulation data A simulated pin-hole camera was used to generate the image data. The camera has the focal length of 18 pixels, unit aspect ratio, zero skew, and principle point at [8, 1] T (in pixel). Random Gaussian noises with zero mean and standard deviations of half pixels were added to the image coordinates to model the practical image noises. The BPP algorithm was first tested by using planar geometric constraints. Three geometric constraints were used: 1) two constraints from two known angles of 1 and 2 ; 2) one constraint from known length ratio (valued 2) of two segments. A physical plane with the normal [.5612,.3333,.7576] T and a distance of 72 cm was projected onto the imaging plane by the simulation camera. The MLS-M model was applied to compute the plane normal. The 2D searching mode was applied and the Gaussian hemisphere was partitioned into 2 4 cells, with each cell center representing one sampled plane normal. The likelihood for each sampled normal was computed with Eq. (15) by using the three geometric constraints. Due to

the Gaussian hemisphere sampling, we may not derive the exact normal. Hence, we define a best normal that has minimum angle with the actual one. In this test, the best normal is [.5641.3336.7553] T, which has the minimum angle of.21 with the actual one. Figures 5 (a)~(c) are for the likelihoods from the three constraints respectively, where the image intensity represents the likelihood, with high intensity for high likelihood. It can be observed that the likelihood of the plane normal is non-uniform due to the different geometric constraints. The joint likelihood from the three constraints is presented in Fig. 5 (d), from which we can observe a dominant area with high likelihoods (see the bright area on the bottom right). Fig. 5 (e) shows the positions of the 3 normal with the highest likelihoods. And the maximum likelihood was searched in the joint likelihood map, with the position marked by + (see Fig. 5 (f)). The corresponding plane normal was computed as [.559.336.764] T. In Fig. 5 (f), the position of the best normal was also marked by o. From the computation results, the computed normal from the proposed model has.25 and.45 with the actual and the best normal, respectively. The results show that MLS-M is accurate. The second simulation experiment was performed to test the proposed model by using both planar and out-of-plane constraints. The same simulation camera was used. The physical plane has the normal direction [.2374,.912,.3322] T with 167 cm distance to the camera center. Three geometric constraints were used for plane structure computation and localization. Among them, one planar constraint is from a known angle (45 ). Two out-of-plane constraints are derived from: 1) one line orthogonal to the plane; 2) a known length ratio (.5) between two segments on a parallel plane. The 2D searching mode was first applied to compute the likelihood and calculate the plane normal by searching for the maximum likelihood. The Gaussian hemisphere was also partition into 2 4 cells. Figure 7 shows the likelihoods for each sampled normal from different constraints. Among them, Fig. 7a and c are for the likelihoods generated from the three individual constraints, which clearly show the non-uniform distributions for the plane normal. The joint likelihood from the three constraints is presented in Fig. 7 (d), from which we can clearly observe a dominant bright area on the upper right corner. It shows that the plane normal has dominant peak distribution due to three constraints. The positions of the 3 plane normal with the highest likelihoods are shown in Fig. 7 (e) to highlight the area. From the joint likelihood map in Fig. 7 (f), the maximum likelihood was computed with the corresponding (a) (b) (c) (d) (e) (f) Fig. 5 (a) (b) (c) the likelihoods from three geometric constraints, respectively, (d) the joint likelihood, (e) the positions of the 3 plane normal with the highest likelihood; (f) positions of the computed and the best normal marked by cross + and circle o, respectively

Z.8 32 3 28 26 24 42 44 46 48 5 Y 52 (a) 54 56 54 52 X 5 48 46 Relative error of distance computation (in %).7.6.5.4.3.2.1 1 2 3 4 5 6 7 8 9 1 Number of point (b) Fig. 6 Localization results: a) 3D positions (marked by + ) of 1 points on the plane in the camera coordinate system (in cm), with the actual positions marked by o ; b) Relative errors for point-to-camera distance computation normal position marked by +. We also marked the position of the best normal by circle. It can be observed that they are in the same position. Hence, the proposed model calculated the best normal in this test. From the orthogonal line constraint, 1D search mode is feasible to search for the maximum likelihood. We first defined the circle to search by using Eq. (17). Afterwards, we sampled 1, points within the searching space (half of the circle in 3D space). The likelihoods were computed for each sampled normal by using Eq. (15) andeq.(17). Fig. 8a and b illustrate the likelihoods computed from the planar constraints of known angle, and out-of-plane constraints of known length ratio on the parallel plane. We can observe multiple peaks in both likelihood curves, which indicate multiple solutions from the individual constraint. More knowledge on multiple solutions is referred to [26]. Fig. 8 (c) shows the joint likelihood, from which we can observe a unique peak. The plane normal was then calculated as [.2474.911.3313] T, which has.3 and.1 with the actual and best normal, respectively. (a) (b) (c) (d) (e) (f) Fig. 7 (a)(b)(c) likelihoods from three geometric constraints of known angle, known angle, and known length ratio, (d) joint likelihood, (e) positions of the 3 plane normal with the highest likelihood; (f) positions of the computed and the best normal marked by cross + and circle o, respectively

1 1 1.9.9.9.8.8.8 The relative probability.7.6.5.4.3 The relative probability.7.6.5.4.3 The relative probability.7.6.5.4.3.2.2.2.1.1.1 1 2 3 4 5 6 7 8 9 1 Number of sampled normal 1 2 3 4 5 6 7 8 9 1 Number of sampled normal 1 2 3 4 5 6 7 8 9 1 Number of sampled normal (a) (b) (c) Fig. 8 1D searching mode with the likelihoods from the two constraints (a-b), and the joint constraints (c) Once the plane normal was calculated, we computed the plane distance by referring to a known length on the plane. And the points on the plane were localized afterwards. Figure 9 (a) shows the reconstructed 3D positions of one hundred points. Figure 9 (b) shows the relative errors for the point-to-camera distance computation. It can be observed that relative distance computation errors are less than.8 %. And the mean of the relative errors is.7 %. Compared to results in Fig. 6, the mean of the relative error increases by.3 %. This is because the plane normal has more tilted angle with the camera optical axis and the sampled points are further from the camera. Finally, the proposed BPP algorithm was compared to the existing localization method with simulation data. We typically chose the classic homography-based method for comparison [28], which is also the foundation of many other localization methods. For the purpose of fair judgment, we used the same constraints for both methods, which are four control points with known coordinates on a reference plane. The homography-based method was realized as follows. First, the homography between the reference plane and its image was computed from the four control points linearly by using the DLT method [8,28]. Second, the rotation and translation were calculated from the computed homography and camera calibration matrix. And finally the normal of the reference plane was calculated from the rotation. In order to robust estimate the homography matrix, the coordinates of the control points were normalized. As for the proposed BPP method, we derived six length ratios and four angles from the control points as the constraints. Moreover, we partitioned the Gaussian hemisphere into 25 5 cells. In the test, the images of the control points were noised with random zero-mean Gaussian.78 Z 7 65 6 6 5 4 Y 3 2 1 3 32 34 X 36 38 4 42 The relative error for distance computation.76.74.72.7.68.66.64 1 2 3 4 5 6 7 8 9 1 (a) Number of point (b) Fig. 9 localization results: a) 3D positions (marked by + ) of 1 points in the camera coordinate system (in cm), with the actual positions marked by o ; b) Relative errors for point-to-camera distance computation

6 6 Plane normal computation errors (in degree) 5 4 3 2 1 BPP Existing Plane normal computation errors (in degree) 5 4 3 2 1 BPP Existing.5 1 1.5 2 Noise levels (in pixel) (a).5 1 1.5 2 Noise levels (in pixel) (b) Fig. 1 comparison of the proposed BPP with the homography-based method: a) the mean computation errors; b) standard deviations of the computation errors noises. And four noise levels (or the standard deviation of the noises) were set as.5, 1., 1.5, and 2. (in pixel). Both the proposed BPP and homography-based methods were tested against different noise levels. In our test, we used the angle between the computed plane normal and the actual normal as the error to evaluate the performance of the methods. For each noise level, we ran 2 trials for both methods and computed the means and standard deviations. The results are illustrated in Fig. 1, wherefig.1 (a) is for the means and Fig. 1 (b) is for the standard deviations for both methods. As can be observed from Fig. 1, the proposed method performs better than existing homography-based method. Especially, with the noise level increasing, the proposed BPP method is more accurate and robust than the homography-based one. For example, when the noise level is as high as 2. pixels, the mean of BPP has about.5 angle, while the existing homography-based one has 5.5 mean degrees. The corresponding standard deviation of BPP is less than 1., while the homography-based one has standard deviation of 5.5. The results demonstrate the proposed BPP is more accurate and robust than existing homography-based method. (a) (b) Fig. 11 Localization from planar constraints: a) original image; b) three constrains (two known angles and one length ratio) from extracted lines and points

(a) (b) (c) (d) (e) (f) Fig. 12 (a) (b) (c) likelihoods from three geometric constraints, (d) the joint likelihood from the three constraints, (e) positions of the 3 normal with the highest likelihood; (f) position of the maximum likelihood marked by + 4.2 Results with real image data A Nikon7 digital camera was used to generate the real image data in the first three real image experiments. The images have the resolution of 2218 1416 (in pixel). The camera was calibrated with Zhang s calibration algorithm. The calibration results show the camera has 1369.2 (in pixel) focal length, 1.1 aspect ratio, zero skew, and principle point at [179, 72] T (in pixel). The calibration results were used for the planar structure recovery and localization. The focal length was also used as ground truth to validate the proposed calibration algorithm. Note that we assume that the image features such as lines, points, corners, etc., were well extracted using image processing techniques in the following tests. The first real image experiment was performed to compute the plane structure and localize the points using planar constraints. Figure 11 (a) shows an original image of a white A4-size paper, attached on a wall. Six black line segments were drawn on the paper, from which we 66 P1 P4 64 62 Z 6 58 56-2 -15-1 P2-5 X 5 1 15 2 P3 1 Y -1-2 Fig. 13 localize points and lines in the camera coordinate system (unit in mm)

(a) (b) Fig. 14 a) original image of a book for localization; b) three lines (L1, L2, L3) expand three planes derived two angles from the lines L1-L2 and L3-L4, and one length ratio between the two segments of M1-M2 and M3-M4 (see Fig. 11 (b)). They were formulated into three planar constraints. The MLS-M model was then applied to compute the plane normal. The likelihood map from each individual constraint is presented in Fig. 12 (a)-(c), which clearly shows the nonuniform distribution of plane normal due to different constraints. The joint likelihood map from the three constraints is given in Fig. 12 (d), in which a bright area on the bottom can be observed. The bright area is further highlighted in Fig. 12 (e) by showing the positions of the plane normal with the highest likelihoods. The position of the maximum likelihood in the joint likelihood map is marked by a cross + (see Fig. 12 (f)). The corresponding plane normal is [.993.1679.988] T. The plane distance was computed by referring the length of P1-P2 (297 mm) in Fig. 13). All the points were localized in the camera coordinate system. Figure 13 shows the localization of the points on the six line segments, and the four corners points of the paper (P1, P2, P3, and P4, marked with circle o ). The computed results were validated as follow. We used the calculated 3D coordinates to compute the length of P2-P3, P3-P4, and P1-P4, which are 26.7 mm, 294.6 mm, and 211.8 mm, respectively. According to the ground truth (the standard A4-size paper of 21 mm 297 mm), the absolute errors are 3.3 mm, 2.4 mm, and 1.8 mm, corresponding to 1.6 %,.8 %, and.9 % relative errors. It demonstrates that the computation results of plane normal computation and localization are accurate. A second real image experiment was performed to test the proposed model by using both planar and out-of-plane constraints with 1D searching mode. As shown in Fig. 14 (a) and 1 1 1.9.9.9.8.8.8.7.7.7.6.6.6.5.5.5.4.4.4.3.3.3.2.2.2.1.1.1 1 2 3 4 5 6 7 8 9 1 1 2 3 4 5 6 7 8 9 1 1 2 3 4 5 6 7 8 9 1 (a) (b) (c) Fig. 15 The likelihoods for the normal of the three planes defined by: a) L1-L2;b) L1-L3; c) L2-L3

1 2.5 x 1-6.9.8 2.7.6 1.5.5.4 1.3.2.5.1 1 2 3 4 5 6 7 8 9 1 1 2 3 4 5 6 7 8 9 1 (a) (b) Fig. 16 Unique solution by adding one constraint from length ratio: a) the likelihood from the length ratio constraint; b) the joint likelihood from two constraints with one peak point Fig. 14 (b), there is an orthogonal corner structure on the book. And every two orthogonal lines define a plane and we totally have three orthogonal planes. And for each plane, there are two constraints: 1) an orthogonal line to the plane; 2) a right angle on the plane. The orthogonal line constraint allows the 1D searching mode to compute the plane normal. We applied the proposed MLS-M model to compute the normal directions for these three planes from two constraints for each plane. The plane normal was computed by 1D searching mode. We uniformly sampled 1, points within the searching space (half of the circle in 3D space), with each point representing a sampled normal. Hence, the angle between two neighboring sampled normal is.18. The likelihood for each sample normal was computed with Eq. (17). The results for the normal computation of the three planes are shown in Fig. 15. One important phenomenon, the multiple solutions to each plane, can be observed in Fig. 15 (a)~(c). In each likelihood curve, there are two peaks to indicate two solutions to the plane normal. More about the multiple solutions for pose computation can be found in [2]. And the results with the proposed model comply with conclusions in the literature [2]. Hence, we totally have eight combinations for the three orthogonal planes, with two of them satisfying the mutually orthogonal constraints. And the proposed algorithm can classify and yield all the M M1 M2 M3 5 45 4 35-4 -2 2 4 3 6-25 -2-15 -1-5 5 1 8 Fig. 17 Localize the points in the camera coordinate system (unit in cm)

Fig. 18 Camera calibration from generalized constraints. Left: original image. Right: two constraints generated from extracted line (L1) and points (M1, M2, and M3) possible solutions. The algorithm can complement existing non-linear iteration methods by providing good initial guess for result refinement. To remove the ambiguity, we added one more constraint to uniquely define the normal of the plane L1-L2, which is the length ratio (valued 1.5) of the length (M-M1) and width (M- M2) of the book. The computation results are illustrated in Fig. 16 (a). The joint likelihood (see Fig. 16 (b)) was then computed from the two constraints by combing the results from Fig. 15 (a) ad Fig. 16 (a). From the joint likelihood, we searched for the maximum likelihood and derived a unique normal to the plane defined by L1 and L2, which is [.22.8425.5387] T. Once the L1-L2 plane normal is determined, the normal for the other two planes can be uniquely computed. We used one actual length to compute the plane distance and localize the corners of the book in the camera coordinate system. The positions are shown in Fig. 17. We validated the computation results based on the fact that the three lines, defined by M- M1, M-M2, and M-M3, are mutually orthogonal. We calculated the three angles by using the calculated 3D coordinates of the points in Fig. 17. The computed angles are 89.4, 88.1, and 94.1, respectively, with the absolute errors of.6, 1.9, and 4.1. The relative errors are.7 %, 2.1 %, and 4.6 %, with the average 2.4 %. The results demonstrate that the BPP algorithm is accurate for visual localization. We also localized the camera in the reference coordinate system, defined by the corner structure, i.e., with M the origin and L1, L2, L3 as the three axes, by simple coordinate system transformation. The result is [ 21.64, 23.82, 12.26] T (in cm). Hence, the object and the camera can be localized from each other. Another experiment was performed to extend the BPP algorithm to camera calibration. As shown in Fig. 18, there is a rectangle-shape mouse pad on the desktop, from which the 1 1 1.9.9.9.8.8.8.7.7.7.6.6.6.5.5.5.4.4.4.3.3.3.2.2.2.1.1.1 1 2 3 4 5 6 7 8 9 1 1 2 3 4 5 6 7 8 9 1 1 2 3 4 5 6 7 8 9 1 (a) (b) (c) Fig. 19 Likelihoods of focal length from: a) length ratio; b) orthogonal line constraint; c) joint constraints

X3 X2 X1 Fig. 2 Three corner points (non co-linear) randomly chosen from a chessboard pattern as the three control points vanishing line of the desktop plane was computed as [.6.15 1.] T. In order to test the calibration algorithm with generalized constraints, we chose two representative constraints: 1) one out-of-plane constraint that L1 is orthogonal to the desktop plane; 2) one planar constraint from the length ratio (valued 1.2) between the two segments of M2-M3 and M1-M2. The searching space for focal length was set from 1 to 1 pixels. The corresponding horizontal view angles range from 169 to 13 (See APPENDIX B), which covers a big range from the wide angle to high resolution lens. The partition resolution is one pixel. The likelihood for each sampled focal length was computed by using Eq. (27). Fig. 19 shows the likelihoods, generated from the individual and joint constraints. It can be observed that there is one peak in each curve, which demonstrates that focal length can be computed from one constraint. The peak point was searched in the joint likelihood and the corresponding focal length is 1388 pixels. Compared with the calibration results, the absolute error is 18.8 pixels, corresponding to 1.4 % relative error. The result demonstrates the proposed model can be extended for camera calibration by exploiting different constraints. In the last experiment, the proposed BPP algorithm was applied to solve the well-known P3P problem. We define a support plane from the three control points (see Fig. 2). It can be proved that the determination of the support plane is the necessary and sufficient condition to solve the P3P problem (see APPENDIX C). The three point-to-point distances from the control points allow the computation of three angles and three length ratios as the geometric constraints, with which we applied MLS-M model to compute the support plane, and localize the three control points. Fig. 21 Images of chessboard pattern for camera calibration and for plane normal recovery with the proposed algorithm, from left to right numbered as 1, 2, 3, 4

Table 1 The normal computation results from three control points Img1 Img2 Img3 Img4 Computed normal Ground truth normal Angle error (in ) [.78.823.562] [.34.634.773] [.7.134.71] [.29.935.354] [.76.825.56] [.3.631.776] [.697.133.75] [.27.934.356].2.33.26.2 In the experiment, a different camera, Nikon Cool-Pix 41, was used to take the images. The images have the resolution of 124 768 (in pixel) (see some of the images in Fig. 21). The three control points (non co-linear) were chosen randomly from grid points on a chessboard pattern, as shown in Fig. 19, so that the support plane coincide with the pattern plane. The chessboard pattern was used for two purposes: 1) for accurate camera calibration with Zhang s approach [28]. The calibrated focal length is 4175.5 pixels; 2) to obtain the ground truth data for the plane normal and point-to-camera distances for the three control points. In the experiments, the ground truth data were obtained from the calibrated camera exterior parameters, such as the rotation matrix and the translation vector. The computed plane normal from the four images are presented in Table 1 above. As can be observed from Table 1, the second and the third rows are for the results of the computed and ground truth plane normal. The angle between them is defined as computation errors, which are presented in the last row (unit in degree). It can be observed that the maximum and the average angles are.33 and.25, respectively, which demonstrate that the computation model is accurate. The distances of the control points were computed afterwards. Table 2 presents the computed distances from four images, where ex i and Xi (i=1,2,3) are for the computed and ground truth distances, respectively. The difference ex i X i defines the distance computation error. From Table 2, the maximum and average distance computation errors are.8 cm and.4 cm, respectively, within 1 m to 2.2 m distances, which demonstrates the BPP algorithm is accurate. The multiple solutions to P3P were also illustrated with the proposed BPP algorithm. The multiple solutions to P3P indicate multiple support planes. Since P3P gives two solutions most of the time, we demonstrate a typical two-solution P3P problem in the test. The classification of two solutions for P3P is referred to [11,25]. As can be observed in Fig. 22, therearetwo peak points in the likelihood map computed from the original image on the left. The likelihood map is highlighted to show the positions of the 3 plane normal with the highest likelihoods by Table 2 The computed distances between the control points and the camera (unit in cm) ex 1 kx 1 k ex 1 X 1 ex 2 kx 2 k ex 2 X 2 ex 3 kx 3 k ex 3 X 3 Img1 164.3 163.5.7 16.6 159.8.8 176.6 175.8.8 Img2 19.3 19..3 12.4 12.2.2 114.8 114.5.3 Img3 114. 114.6.5 113.1 113.6.6 126.1 126.5.5 Img4 199.8 199.9.1 24.7 24.6.1 216.9 216.9.

Fig. 22 Illustration of the two solutions to a practical P3P problem: a) Left: original image; b) Top right: likelihood map with two dominant local maximums; c) Bottom right: positions of the 3 plane normal with the highest likelihoods and the two computed normal (marked with + ) thresholding. We can observe that there are two dominant areas segmented from the map. For each area, we computed the local maximum likelihood to derive two plane normal, which are [.699.8883.454] T and [.917.819.594] T (see the two positions marked with + in Fig. 22). Hence, the proposed BPP algorithm can not only solve a P3P problem but also classify the solutions. 5 Conclusions and recommendations Planar scenes are commonly found in daily life and provide rich information for visual localization. In this paper, we proposed the Perspective-Plane problem, a similar but more general localization problem, compared to the well-known PnP and PnL problem. We addressed the problem within the Bayesian framework and proposed the Bayesian Perspective-Plane (BPP) algorithm. The core conception is the computation of the plane normal with a Maximum Likelihood Searching Model (MLS-M) by using generalized planar and out-of-plane constraints. The likelihood for each constraint is modeled with a normalized Gaussian function. The 2D and 1D searching modes were proposed to find the maximum likelihood and compute the plane normal. The proposed BPP algorithm has been tested with both simulation and real image data, which show that it is generalized to utilize different types of constraints for accurate object localization. Extensions of the proposed BPP algorithm for camera calibration were illustrated in the experiment with good results reported. Moreover, the BPP algorithm was successfully applied to solve the well-known Perspective-Three-Point (P3P) problem and classify the solutions. The results demonstrate that the proposed BPP algorithm is practical and flexible with good potentials for visual localization. Future researches based on current work are recommended as follow: 1) incorporate the inequality constraint into the model, such as the first angle is larger than the second one; 2) improve the likelihood model to deal with different scales, units, etc.; 3) fast implementation of the maximum likelihood searching model (MLS-M), especially for high partition resolution of the Gaussian hemisphere. Some possible approaches can be coarse-to-fine searching strategy,

non-uniform distribution of plane normal on Gaussian sphere; 4) extend the proposed models and methods to more general scenes besides planar scenes, such as curve surface, sphere, etc. Acknowledgments The work presented in this paper was sponsored by National Natural Science Foundation of China (NSFC) (No. 5128168), Tianjin Natural Science Foundation (No. 13JCYBJC377), the Youth Top- Notch Talent Plan of Hebei Province, China, the Fundamental Research Funds for the Central Universities (WUT: 214-IV-68), and the Grant-in-Aid for Scientific Research Program (No. 149) from the Japan Society for the Promotion of Science (JSPS). Appendixes A. Uniformly sampling on the circle in 3D space We can uniform sample on the circle in Eq. (19) by first sampling on a standard circle, and then mapping the sampled points via a rotation transform. This is implemented by following the three steps: Step 1 Uniformly sample the standard circle, which has the following equation: X 2 þ Y 2 ¼ 1 Z ¼ ð29þ Step 2 This can be done by uniformly sampling within the angle space, and a sampled point is represented by ½cosðθ i Þ sinðθ i Þ Š T,withθ i [ 2π). Compute the rotation transform. The two circles defined by Eq. (19) and (29)are mapped via a rotation matrix, which satisfies R½ 1Š T ¼ N ð3þ Step 3 The rotation matrix is not uniquely determined, because a circle is invariant to the rotation around the normal. We practically can use two arbitrary orthogonal vectors that satisfy Eq. (3), as the first and second column vectors. Transform the sampled points by rotation. Each sampled normal on the standard circle is transformed with the rotation matrix 2 3 2 3 cosðθ i Þ r 1 cosðθ i Þþr 4 sinðθ i Þ R4 sinðθ i Þ5 ¼ 4 r 2 cosðθ i Þþr 5 sinðθ i Þ5 ð31þ r 3 cosðθ i Þþr 6 sinðθ i Þ Hence, a uniform sampling on the circle from Eq. (19) isaccomplished.we can prove that the angles between two neighboring sampled points before and after transformation are identical, because 2 31T 2 31 2 3T2 3 cosðθ i Þ cosðθ i 1 Þ cosðθ i Þ cosðθ i 1 Þ @ R4 sinðθ i Þ5A @ R4 sinðθ i 1 Þ 5A ¼ 4 sinðθ i Þ5 4 sinðθ i 1 Þ 5 ð32þ B. Define the searching space for focal length It is difficult to define the searching space of the focal length directly, since images are

in different resolutions in practice. Instead, we define the searching space from the camera viewing angles. There are three view angles defined for a camera: horizontal, vertical, and diagonal. In this paper, we use the horizontal one. It is defined as θ h ¼ 2tan 1 where h is the horizontal resolution of the image, and f is the camera focal length. As a result, we can estimate the focal length from the view angle and the image resolution: h f ¼ 2tan θ ð34þ h 2 h 2f ð33þ Usually, we have some prior knowledge of the common used lens. We can thus define the searching space of the focal length from the range of the view angle and the image resolution. C. Lemma: Determining the support plane is the necessary and sufficient condition to solve the P3P problem 1. Proof: 1) Necessary condition The distances of the three control points to the camera are known from a solved P3P. Hence, we can determine the 3D coordinates of each control points by the following equations X i ¼ λk 1 x i ð35þ kx i k ¼ d i With the recovered control points, the support plane is uniquely determined then. 1. 2) Sufficient condition The plane normal and distance are known from a determined support plane. As a result, we can use Eq. (4) to compute the 3D positions of each control points. The distances of the three points are computed readily from the 3D coordinates. Hence, the P3P problem is solved. References 1. Adan A, Martin A, Valero E, Merchan P (29) Landmark real-time recognition and positioning for pedestrian navigation. CIARP, Guadalajara, Mexico 2. Cham T, Arridhana C et al (21) Estimating camera pose from a single urban ground-view omni-directional image and a 2D building outline map. CVPR, SF, CA 3. Criminisi A, Reid I, Zisserman A (2) Single view metrology. International Journal of Computer Vision 4(2):123 148

4. Desouza GN, Kak AC (22) Vision for mobile robot navigation: a survey. IEEE Trans on Pattern Analysis and Machine Intelligence 24(2):237 267 5. Durrant-Whyte H, Bailey T (26) Simultaneous localization and mapping (SLAM): part I the essential algorithms. IEEE Robotics and Automation Magazine 13(2):99 11 6. Guan P, Weiss A, Balan A, Black M (29) Estimating human shape and pose from a single image, international conference on computer vision. Kyoto, Japan 7. Guo F, Chellappa R (21) Video metrology using a single camera. IEEE Trans on Pattern Analysis and Machine Intelligence 32(7):1329 1335 8. Hartley R, Zisserman A (24) Multiple view geometry in computer vision, 2nd edn. Cambridge University Press, Cambridge 9. Horn BKP (1987) Closed-form solution of absolute orientation using unit quaternions. J Opt Soc Am A 4(4): 629 642 1. Hu Z, Matsuyama T (212) Bayesian perspective-plane (BPP) for localization, international conference on computer vision theory and applications (VISAPP). Rome, Italy, pp 241 246 11. L. Kneip, D. Scaramuzza, R. Siegwart,, (211) A novel parameterization of the perspective-threepoint problem for a direct computation of absolute camera position and orientation, Proc. of the 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 12. H. Lategahn, C. Stiller, Vision-Only Localization, IEEE Intelligent (214) Transportation Systems Magazine, DOI: 1.119/TITS.214.2298492 13. Lee DC, Hebert M, Kanade T (29) Geometric reasoning for single image structure recovery. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)(June) 14. Leopardi P (26) A partition of the unit sphere into regions of equal area and small diameter. Electronic Transactions on Numerical Analysis 25(12):39 327 15. Li S, Xu C (211) A stable direct solution of perspective-three-point problem. International Journal of Pattern Recognition and Artificial Intelligence 25(11):627 642 16. Liebowitz D, Zisserman A (1998) Metric rectification for perspective images of planes. CVPR, Santa Barbara, CA 17. A. Pretto, S. Tonello, E. Menegatti, (213) Flexible 3D localization of planar objects for industrial binpicking with monocamera vision system, IEEE International Conference on Automation Science and Engineering, pp. 168 175 18. S.Q., Li, C. Xu, and M. Xie, A Robust (212) O (n) Solution to the perspective-three-point problem, IEEE Transaction on Pattern Analysis and Machine Intelligence, 34 (7): 1444 145 19. Schneider D, Fu X, Wong K (21) Reconstruction of display and eyes from a single image. CVPR, SF, CA 2. Shi F, Zhang X, Liu Y (24) A new method of camera pose estimation using 2D-3D corner correspondence. Pattern Recognition Letters 25(1):85 89 21. Sun Y, Yin L (28) Automatic pose estimation of 3D facial models. ICPR, FL, US 22. Wang G, Hu Z, Wu F, Tsui H (25) Single view metrology from scene constraints. Image and Vision Computing 23(9):831 84 23. Wang R, Jiang G, Quan L, Wu C (212) Camera calibration using identical objects. Machine Vision and Applications 23(3):579 587 24. Witkin AP (1981) Recovering surface shape and orientation from texture. Artificial Intelligence 17(1 3):17 45 25. Wolfe W, Mathis D, Sklair C, Magee M (1991) The perspective view of 3 points. IEEE Transaction on Pattern Analysis and Machine Intelligence 13(1):66 73 26. Wu Y, Li X, Wu F, Hu Z (26) Coplanar circles, quasi-affine invariance and calibration. Image and Vision Computing 24(4):319 326 27. A Zakhor, A Hallquist, Single view pose estimation of mobile devices in urban environments, Proceedings of the 213 I.E. Workshop on Applications of Computer Vision (WACV), 213, pp. 347 354 28. Zhang Z (2) A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(11):133 1334 29. Zhang BW, Li Y (28) Dynamic calibration of the relative pose and error analysis in a structured light system. J Opt Soc Am A 25(3):612 622 3. Zhu X, Ramanan D (212) Face detection, pose estimation, and landmark localization in the wild. Computer Vision and Pattern Recognition (CVPR), Providence, RI

Zhaozheng Hu received the Bachelor and PhD degrees from Xi an Jiaotong University, China, in 22 and 27, respectively, both in information and communication engineering. During his PhD period, he visited the computer vision lab at the Chinese University of Hong Kong from Nov., 24 to Jun., 25. From Jul., 27 to Aug., 29, he worked as a Post-doc Fellow in Georgia Institute of Technology, U.S.A. From Jul., 21 to Jul., 212, he was with the visual information processing lab in Kyoto University, Japan, under a JSPS Fellowship program. He is currently a professor in Wuhan University of Technology, Wuhan, China. His research topics mainly focus on visual geometry, stereo vision, intelligent transportation system, intelligent surveillance system, etc. Takashi Matsuyama received the BEng, MEng, and DEng degrees in electrical engineering from Kyoto University, Japan, in 1974, 1976, and 198, respectively. Currently, he is working as a professor in the Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University. His research interests include knowledge-based image understanding, computer vision, 3D video, human-computer interaction, and smart energy management. He has written more than 1 papers and books, including two research monographs: A Structural Analysis of Complex Aerial Photographs (Plenum, 198) and SIGMA: A Knowledge-Based Aerial Image Understanding System (Plenum, 199). He won 1 best paper awards from Japanese and international academic societies including the Marr Prize at ICCV 95. He is on the editorial board of the Pattern Recognition Journal. He was awarded fellowships from the International Association for Pattern Recognition, the Information Processing Society of Japan, and the Institute of Electronics, Information, and Communication Engineers Japan. He is a member of the IEEE.