Jump Stitch Metadata & Depth Maps Version 1.1

Similar documents
521466S Machine Vision Exercise #1 Camera models

CSE 252B: Computer Vision II

Camera Model and Calibration

Computer Vision cmput 428/615

Computer Vision. Coordinates. Prof. Flávio Cardeal DECOM / CEFET- MG.

Omni Stereo Vision of Cooperative Mobile Robots

Rectification and Distortion Correction

Image Formation. Antonino Furnari. Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania

Homogeneous Coordinates. Lecture18: Camera Models. Representation of Line and Point in 2D. Cross Product. Overall scaling is NOT important.

CS 130 Final. Fall 2015

Rigid Body Motion and Image Formation. Jana Kosecka, CS 482

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration

CIS 580, Machine Perception, Spring 2016 Homework 2 Due: :59AM

Outline. ETN-FPI Training School on Plenoptic Sensing

Robotics - Projective Geometry and Camera model. Marcello Restelli

Wearality Sky Field of View

Vision Review: Image Formation. Course web page:

Visual Recognition: Image Formation

Camera Model and Calibration. Lecture-12

3D Sensing and Reconstruction Readings: Ch 12: , Ch 13: ,

Camera model and multiple view geometry

Chapter 3 Image Registration. Chapter 3 Image Registration

3D Sensing. 3D Shape from X. Perspective Geometry. Camera Model. Camera Calibration. General Stereo Triangulation.

calibrated coordinates Linear transformation pixel coordinates

Fall 2016 Semester METR 3113 Atmospheric Dynamics I: Introduction to Atmospheric Kinematics and Dynamics

Calibrating an Overhead Video Camera

Camera Calibration for Video See-Through Head-Mounted Display. Abstract. 1.0 Introduction. Mike Bajura July 7, 1993

Graphics and Interaction Transformation geometry and homogeneous coordinates

COMP30019 Graphics and Interaction Transformation geometry and homogeneous coordinates

Assignment 2 : Projection and Homography

More Mosaic Madness. CS194: Image Manipulation & Computational Photography. Steve Seitz and Rick Szeliski. Jeffrey Martin (jeffrey-martin.

Camera Calibration for a Robust Omni-directional Photogrammetry System

Laser sensors. Transmitter. Receiver. Basilio Bona ROBOTICA 03CFIOR

Announcements. Mosaics. How to do it? Image Mosaics

Introduction to Homogeneous coordinates

Camera Models and Image Formation. Srikumar Ramalingam School of Computing University of Utah

Flexible Calibration of a Portable Structured Light System through Surface Plane

Single View Geometry. Camera model & Orientation + Position estimation. What am I?

CS201 Computer Vision Camera Geometry

Module 6: Pinhole camera model Lecture 32: Coordinate system conversion, Changing the image/world coordinate system

Pin Hole Cameras & Warp Functions

3D Geometry and Camera Calibration

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 263

Introduction to 3D Machine Vision

SUMMARY. CS380: Introduction to Computer Graphics Projection Chapter 10. Min H. Kim KAIST School of Computing 18/04/12. Smooth Interpolation

Archeoviz: Improving the Camera Calibration Process. Jonathan Goulet Advisor: Dr. Kostas Daniilidis

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 253

Information page for written examinations at Linköping University TER2

3-D D Euclidean Space - Vectors

For each question, indicate whether the statement is true or false by circling T or F, respectively.

COSC579: Scene Geometry. Jeremy Bolton, PhD Assistant Teaching Professor

Agenda. Rotations. Camera models. Camera calibration. Homographies

3D omnistereo panorama generation from a monocular video sequence Ning, Zhiyu U

Visualisation Pipeline : The Virtual Camera

5LSH0 Advanced Topics Video & Analysis

Machine vision. Summary # 11: Stereo vision and epipolar geometry. u l = λx. v l = λy

Precise Omnidirectional Camera Calibration

Cameras and Radiometry. Last lecture in a nutshell. Conversion Euclidean -> Homogenous -> Euclidean. Affine Camera Model. Simplified Camera Models

EXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006,

Camera Projection Models We will introduce different camera projection models that relate the location of an image point to the coordinates of the

Jim Lambers MAT 169 Fall Semester Lecture 33 Notes

Correspondence and Stereopsis. Original notes by W. Correa. Figures from [Forsyth & Ponce] and [Trucco & Verri]

Trigonometric Functions of Any Angle

Chapters 1-4: Summary

Stereo Vision. MAN-522 Computer Vision

Unit 3 Multiple View Geometry

NAME VCamera camera model representation

Stereo CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Massachusetts Institute of Technology Department of Computer Science and Electrical Engineering 6.801/6.866 Machine Vision QUIZ II

Perspective Projection [2 pts]

ECE-161C Cameras. Nuno Vasconcelos ECE Department, UCSD

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection

Computer Vision Projective Geometry and Calibration. Pinhole cameras

LUMS Mine Detector Project

Field of View (Zoom)

CS223b Midterm Exam, Computer Vision. Monday February 25th, Winter 2008, Prof. Jana Kosecka

A Framework for 3D Pushbroom Imaging CUCS

Accurate and Dense Wide-Baseline Stereo Matching Using SW-POC

COMP 558 lecture 19 Nov. 17, 2010

COMP30019 Graphics and Interaction Three-dimensional transformation geometry and perspective

S U N G - E U I YO O N, K A I S T R E N D E R I N G F R E E LY A VA I L A B L E O N T H E I N T E R N E T

Instance-level recognition I. - Camera geometry and image alignment

A Review of Image- based Rendering Techniques Nisha 1, Vijaya Goel 2 1 Department of computer science, University of Delhi, Delhi, India

Section 10.1 Polar Coordinates

2D and 3D Transformations AUI Course Denbigh Starkey

Mathematics of a Multiple Omni-Directional System

3D Rendering and Ray Casting

arxiv: v1 [cs.cv] 28 Sep 2018

Early Fundamentals of Coordinate Changes and Rotation Matrices for 3D Computer Vision

Cameras and Stereo CSE 455. Linda Shapiro

Pin Hole Cameras & Warp Functions

Homework #1. Displays, Alpha Compositing, Image Processing, Affine Transformations, Hierarchical Modeling

Answers to practice questions for Midterm 1

Structured Light. Tobias Nöll Thanks to Marc Pollefeys, David Nister and David Lowe

1 Projective Geometry

Omnivergent Stereo-panoramas with a Fish-eye Lens

A Low Power, High Throughput, Fully Event-Based Stereo System: Supplementary Documentation

Hartley - Zisserman reading club. Part I: Hartley and Zisserman Appendix 6: Part II: Zhengyou Zhang: Presented by Daniel Fontijne

DD2429 Computational Photography :00-19:00

L16. Scan Matching and Image Formation

Transcription:

Jump Stitch Metadata & Depth Maps Version 1.1 jump-help@google.com Contents 1. Introduction 1 2. Stitch Metadata File Format 2 3. Coverage Near the Poles 4 4. Coordinate Systems 6 5. Camera Model 6 6. Panorama projection model 8 6.1. Projecting from the Panorama to a 3D Ray.................. 9 6.2. Projecting from a 3D Point to the Panorama................. 10 6.3. Disparity in the Panorama............................ 11 7. Depth Maps 11 A. Example Stitch Metadata 13 B. Version History 14 1. Introduction This document defines the stitch metadata and depth map file formats produced by the Jump Assembler. It also describes the equations by which depth map values map to 3D points in the scene and how 3D scene points project into the source cameras and the panorama. 1

2. Stitch Metadata File Format For every shot, the Jump Assembler exports a JSON formatted metadata file. This file contains information about the rig including calibration parameters for each of the cameras, the vertical field of view of the shot, and the interpupillary distance. See Appendix A for an example. In this section we describe each field found in the metadata file. Some field names are followed by the mathematical symbol(s) that represent their value in equations in this document. The number in brackets after each field gives the first version of the metadata spec in which it appeared. All fields are included in each metadata file even if they are not directly relevant, for example depth related fields are included even in a stitching session which contains no depth stitches. major version number (v1.0): Major version of the metadata file. This value will be incremented when changes are made to the file format which are not backward compatible. minor version number (v1.0): Minor version of the metadata file. Incrementing this value indicates that one or more new fields have been added to the format. Any other type of change will result in an update to the major version number instead. interpupillary distance in meters (v1.0): The interpupillary distance (IPD) used for rendering the stitch, measured in meters. rig type (v1.0): A string defining the type of rig used for the capture. Currently the only valid values are Odyssey and Halo. rig id (v1.0): A string that gives a globally unique rig id. Two shots with the same rig id must have been captured using the same physical rig, although calibration may still change between the shots as individual components move or are replaced. missing bottom coverage in degrees (v1.0): The elevation, in degrees from the bottom of the stitch, where content is missing for the color stitch. missing top coverage in degrees (v1.0): The elevation, in degrees from the top of the stitch, where content is missing for the color stitch. missing bottom depth coverage in degrees (v1.1): The elevation, in degrees from the bottom of the stitch, where content is missing for the depth stitch. missing top depth coverage in degrees (v1.1): The elevation, in degrees from the top of the stitch, where content is missing for the depth stitch. mono top coverage in degrees (v1.1): The elevation range, in degrees, for which the top of the stitch is monoscopic. See section 3 for more details. 2

mono bottom coverage in degrees (v1.1): The elevation range, in degrees, for which the bottom of the stitch is monoscopic. See section 3 for more details. partial stereo top coverage in degrees (v1.1): The elevation range, in degrees, over which the top of the stitch transitions from mono to stereo. See section 3 for more details. partial stereo bottom coverage in degrees (v1.1): The elevation range, in degrees, over which the bottom of the stitch transitions from mono to stereo. See section 3 for more details. minimum encodable depth in meters (v1.0): Depth stitches are encoded as inverse depth and this value is the minimum depth which can be encoded. See section 7 for more details. distortion delta in meters (v1.0): δ - Jump stitches use an omnidirectional stereo (ODS) projection that has been distorted to better fit the physical camera geometry. This value is the distance by which ODS viewing ray origins are shifted to produce the distorted ODS projection. See Section 6 for more details. frames per second (v1.0): The framerate of the stitch. cameras (v1.0): An array of cameras which provide the intrinsics and extrinsics of each camera in the rig as described below. Each camera in the cameras array contains the following fields: name (v1.0): The name of the camera. By convention this is the name of the video file corresponding to that camera without the file extension. position (v1.0): [c x, c y, c z ] - The position of the camera in world coordinates. Given in meters. orientation (v1.0): [a x, a y, a z ] - The orientation of the camera in angle-axis format, where angle in radians is equal to the magnitude of the vector. projection type (v1.0): The projection model used for the camera. Either perspective or fisheye. For the Odyssey and Halo rigs this value will always be fisheye. image size (v1.0): [w, h] - The width and height of the camera s image in pixels. principal point (v1.0): [u 0, v 0 ] - The horizontal and vertical position of the camera s principal point in pixels. radial distortion (v1.0): [r 0, r 1 ] - First and second order radial distortion parameters for the camera. focal length (v1.0): f - Focal length of the camera in pixels. 3

Table 1: Metadata field names and the mathematical symbols corresponding to them. JSON Field Symbol distortion delta in meters δ camera::position [c x, c y, c z ] camera::orientation [a x, a y, a z ] camera::image size [w, h] camera::principal point [u 0, v 0 ] camera::radial distortion [r 0, r 1 ] camera::focal length f 3. Coverage Near the Poles The ODS projection does not accurately model stereo near the poles. One way to avoid the distortion this introduces is to fade to a monoscopic stitch near the poles. This is done by reducing the IPD of the stitch near the poles. Depending on the rig used for capture it is also possible that content is not available at one or both of the poles. This section defines the parameters used to describe the stitch near the poles. At the top and bottom pole there are three distinct regions which don t exhibit fully stereo content. The layout of the regions is shown below and the corresponding ranges in a sample stitch are shown in figure 1. missing top coverage in degrees mono top coverage in degrees partial stereo top coverage in degrees Main region of stitch (fully stereo) partial stereo bottom coverage in degrees mono bottom coverage in degrees missing bottom coverage in degrees We will describe these regions for the top of the stitch, the ones at the bottom are exactly analagous. The first region contains no valid image data, color is diffused in from the boundary of the next region. This region spans an elevation range from zero to missing top coverage in degrees. The second region contains a monoscopic stitch (corresponding to an IPD of zero). 4

This region spans an elevation range from missing top coverage in degrees to (missing top coverage in degrees + mono top coverage in degrees). The third region contains a stereo stitch with varying IPD. The exact value of the IPD in this region is not specified but it is guaranteed to be monotonically increasing from zero at the beginning of the region to interpupillary distance in meters at the end of the region. This region spans an elevation range from (missing top coverage in degrees + mono top coverage in degrees) to (missing top coverage in degrees + mono top coverage in degrees + partial stereo top coverage in degrees). Figure 1 shows the values described above for one eye from a stitch captured with a Halo rig. Section 6 defines exactly how these values interact with IPD. Figure 1: Sample color frame for a single eye from a Halo rig with coverage values marked. For depth stitches we only provide depth information in the region for which we have full stereo. This means that only one value is required for the top pole and one for the bottom pole. The region at the top of the stitch for which we have no depth data spans an elevation range of zero to missing top depth coverage in degrees. Within this region depth is diffused from the region boundary. This is show in figure 2. 5

Figure 2: Sample depth frame for a single eye from a Halo rig with coverage values marked. 4. Coordinate Systems In image space the top left corner of the top left pixel has coordinates [0, 0]. The center of the top left pixel lies at [0.5, 0.5] and for an image size of w h the center of the bottom right pixel is at [w 0.5, h 0.5]. We use a right handed coordinate system. A camera at the origin which has not been rotated looks down the positive z axis, with the positive x axis pointing to the right and the positive y axis pointing down. The output stitches are over-under equirectangular in which the left eye is on top. The positive y axis points down the image. The positive z axis corresponds to the left edge of the stitch. 5. Camera Model In this section, we describe how a point in world space with homogeneous coordinates [p x, p y, p z, p w ] is projected into the image space of one the cameras in the rig with coordinates [u, v]. The process can be broken down into two steps as follows: 1. Transform Into Camera Space 6

For a camera with position [c x, c y, c z ] and orientation [a x, a y, a z ] we transform a point in homogeneous coordinate in world space [p x, p y, p z, p w ] to a point in camera space [x, y, z] using the following equation, x p x p w c x y = R p y p w c y. (1) z p z p w c z Here, the 3 3 rotation matrix R can be computed from from the angle axis representation of orientation [a x, a y a z ] as follows: θ = a 2 x + a 2 y + a 2 z, (2) â x a x â y = 1 a y (3) θ â z a z R = â x â x (1 cos θ) + cos θ â x â y (1 cos θ) â z sin θ â x â z (1 cos θ) + â y sin θ â x â y (1 cos θ) + â z sin θ â y â y (1 cos θ) + cos θ â y â z (1 cos θ) â x sin θ â x â z (1 cos θ) â y sin θ â y â z (1 cos θ) + â x sin θ â z â z (1 cos θ) + cos θ (4) Here, θ is the angle of rotation around the unit vector [â x, â y, â z ]. 2. Project Into Image Space Given focal length f, principal point [u 0, v 0 ] and radial distortion [r 0, r 1 ], we can now, depending on the projection type, project a point [x, y, z] in camera space into the image as follows. fisheye [ˆx ŷ x α = tan 1 2 + y 2 ] [ z α x = x 2 + y 2 y (5) ], (6) d = 1 + r 0 α 2 + r 1 α 4 (7) [ ] [ ] u fdˆx + u0 =. (8) v fdŷ + v 0 Here, α is the the angle between the optical axis and the ray to the point [x, y, z], and d is the radial distortion factor. 7

perspective ] [ˆx ŷ = 1 z [ ] x y d = 1 + r 0 (ˆx 2 + ŷ 2) + r 1 (ˆx 2 + ŷ 2) 2 [ ] [ ] u fdˆx + u0 = v fdŷ + v 0 (9) (10) (11) Here, d is the radial distortion factor. 6. Panorama projection model In this section we describe the mapping from a point in the stitched panorama to a 3D ray and the mapping from 3D points back into the panorama. Jump videos use the omnidirectional stereo (ODS) projection. A summary of the ODS model and how it can be used to render imagery can be found in Rendering Omnidirectional Stereo Content 1. That document describes the ideal ODS projection, where viewing rays originate from a circle of diameter equal to the interpupillary distance (IPD). In practice though, the diameter of a Jump camera is significantly larger than IPD. To avoid holes in the stitch, we use a distorted ODS projection as shown in Figure 3. In this model the rays originate from the circle on which the cameras lie. Visually this results in subjects that are close to the camera being vertically stretched. 1 Currently Rendering Omni-directional Stereo Content and this document differ in their coordinate system conventions. We are working on fixing this discrepancy. 8

d +z o left +x d o right r +y Figure 3: This figure shows the distorted ODS model where ray origins are shifted horizontally by the distance δ to o left and o right. 6.1. Projecting from the Panorama to a 3D Ray Consider a pixel with position [θ, φ] in an equirectangular panorama, where θ [0, 2π] is the latitude and φ [0, π] is the longitude and [0, 0] corresponds to the bottom left of the panorama. Then for the two eyes, the respective rays have the same direction sin θ sin φ d(θ, φ) = cos φ, (12) cos θ sin φ but different origins: δ sin θ r cos θ o left (θ, φ) = k 0, δ cos θ + r sin θ δ sin θ + r cos θ o right (θ, φ) = k 0. (13) δ cos θ r sin θ where r = interpupillary distance in meters/2. 9

For the majority of the stitch k is 1 but near the poles it is reduced to avoid distortion (by reducing the effective IPD and distortion delta of the stitch): where φ d = φ 180/π 0, if φ d < a undefined, if a <= φ d < b k = 1, if b <= φ d < c undefined, if c <= φ d < d 0, if d <= φ d a = missing bottom coverage in degrees + mono bottom coverage in degrees (16) b = a + partial stereo bottom coverage in degrees (17) c = 180 (missing top coverage in degrees + mono top coverage in degrees) (18) d = c partial stereo bottom coverage in degrees. (19) In the regions a <= φ < b and c <= φ < d the exact value of k is undefined. We only guarantee that in the region a <= φ < b it monotonically increases from 0 to 1 with increasing φ and in the region c <= φ < d it monotonically decreases from 1 to 0 with increasing φ. 6.2. Projecting from a 3D Point to the Panorama (14) (15) To project a point [x, y, z] into the panorama use the following equations: ( ) θ left = atan2(x, z) + sin 1 kr x 2 + z 2 ( ) θ right = atan2(x, z) sin 1 kr x 2 + z 2 φ = π ( ) 2 + y tan 1 x 2 + z 2 k 2 r 2 kδ (20) (21) (22) where k is defined in equation 15. When projecting an arbitrary 3D point we don t know what the correct value of k should be (since it depends on φ). This means that we must solve for k at the same time as φ. One way to do this is to initially assume a value of k = 1 and if the resulting value of φ is not consistent with equation 15 then iteratively update k. 10

6.3. Disparity in the Panorama It is possible to estimate disparity in the panorama (the horizontal distance in pixels between corresponding points in the left and right eye panoramas) directly from the depth value for a pixel: disparity = W ( ( )) kδ + d sin(φ) π 2 tan 1 (23) 2π kr where d is the depth associated with the pixel in meters, W is the width of the panorama in pixels and k is defined in equation 15. 7. Depth Maps Figure 4: Example color and depth frames output from Jump Assembler. The depth maps produced by the Jump Assembler are in the same format as the stitches, over-under equirectangular in which the left eye is on top. The depth stitches use an inverse depth encoding to minimize quantization artifacts and the metric depth along a ray can be found from the greyscale values (in the range 0 to 1) as follows: depth in meters = minimum encodable depth in meters value in depth stitch (24) 11

The value of minimum encodable depth in meters can be found in the stitch metadata. Using the projection described in Section 6 depth in meters can then be used to generate a 3D point for each pixel in the stitch. 12

A. Example Stitch Metadata An abbreviated example of a sample stitch metadata file is shown below { "major_version_number": 1, "minor_version_number": 1, "rig_type": "Halo", "rig_id": "ATSaZ63HTME", "interpupillary_distance_in_meters": 0.05, "missing_bottom_coverage_in_degrees": 32.5, "missing_top_coverage_in_degrees": 0, "minimum_encodable_depth_in_meters": 0.3, "distortion_delta_in_meters": 0.1479019945774904, "frames_per_second": 59.940059940059939, "mono_bottom_coverage_in_degrees": 0, "partial_stereo_bottom_coverage_in_degrees": 0, "missing_bottom_depth_coverage_in_degrees": 32.5, "mono_top_coverage_in_degrees": 5, "partial_stereo_top_coverage_in_degrees": 25, "missing_top_depth_coverage_in_degrees": 30, "cameras": [ { "name": "camera01", "position": [ 0, -0, 0.14999999999999999 ], "orientation": [ 0.0067773935363294744, 0.02128360697196437, 1.5790957390209466 ], "projection_type": "fisheye", "image_size": [ 1920, 1440 ], "principal_point": [ 968.30373943599193, 704.98857987122119 ], "radial_distortion": [ 0.096251462704779911, -0.016483790420092587 ], "focal_length": 834.50935697536784 }, { "name": "camera02", "position": [ 0.057402514854763463, -0, 0.13858192987669302 ], "orientation": [ 0.31680907223631799, -0.28573457838845956, 1.5570910398280071 ], "projection_type": "fisheye", "image_size": [ 1920, 1440 ], "principal_point": [ 959.49326826870185, 698.67798012008973 ], "radial_distortion": [ 0.098776848879089912, -0.017666832605844663 ], "focal_length": 834.11484482529715 }, {... } ] } 13

B. Version History Version 1.0 September 14, 2016. Version 1.1 November 8, 2017. Added parameters which define a transition to a monoscopic stitch at the poles. Added details on projecting 3D points into the panorama and on estimating disparity in the panorama. 14