Mobile Point Fusion. Real-time 3d surface reconstruction out of depth images on a mobile platform

Similar documents
Object Reconstruction

Mesh from Depth Images Using GR 2 T

Outline. 1 Why we re interested in Real-Time tracking and mapping. 3 Kinect Fusion System Overview. 4 Real-time Surface Mapping

Memory Management Method for 3D Scanner Using GPGPU

3D Computer Vision. Depth Cameras. Prof. Didier Stricker. Oliver Wasenmüller

ABSTRACT. KinectFusion is a surface reconstruction method to allow a user to rapidly

Static Scene Reconstruction

Point Sample Rendering

A New Approach For 3D Image Reconstruction From Multiple Images

Handheld scanning with ToF sensors and cameras

Surface Registration. Gianpaolo Palma

Generating 3D Colored Face Model Using a Kinect Camera

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Dynamic 2D/3D Registration for the Kinect

Multiple View Depth Generation Based on 3D Scene Reconstruction Using Heterogeneous Cameras

Rigid ICP registration with Kinect

High-speed Three-dimensional Mapping by Direct Estimation of a Small Motion Using Range Images

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

Virtualized Reality Using Depth Camera Point Clouds

A Pipeline for Building 3D Models using Depth Cameras

Freehand Voxel Carving Scanning on a Mobile Device

Probabilistic Surfel Fusion for Dense LiDAR Mapping Chanoh Park, Soohwan Kim, Peyman Moghadam, Clinton Fookes, Sridha Sridharan

3D Models from Range Sensors. Gianpaolo Palma

Visualization of Temperature Change using RGB-D Camera and Thermal Camera

5.2 Surface Registration

Geometric Reconstruction Dense reconstruction of scene geometry

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Monocular Tracking and Reconstruction in Non-Rigid Environments

Kinect 3D Reconstruction

3D PLANE-BASED MAPS SIMPLIFICATION FOR RGB-D SLAM SYSTEMS

When Can We Use KinectFusion for Ground Truth Acquisition?

Lecture 19: Depth Cameras. Visual Computing Systems CMU , Fall 2013

Simpler Soft Shadow Mapping Lee Salzman September 20, 2007

3D map reconstruction with sensor Kinect

Project Updates Short lecture Volumetric Modeling +2 papers

A Flexible Scene Representation for 3D Reconstruction Using an RGB-D Camera

Removing Moving Objects from Point Cloud Scenes

FEATURE-BASED REGISTRATION OF RANGE IMAGES IN DOMESTIC ENVIRONMENTS

Visualization of Temperature Change using RGB-D Camera and Thermal Camera

Robust Online 3D Reconstruction Combining a Depth Sensor and Sparse Feature Points

3D Photography: Active Ranging, Structured Light, ICP

3D Photography: Stereo

Evaluation of Large Scale Scene Reconstruction

KinectFusion: Real-Time Dense Surface Mapping and Tracking

Semi-Automatic Initial Registration for the iray System: A User Study

Deferred Splatting. Gaël GUENNEBAUD Loïc BARTHE Mathias PAULIN IRIT UPS CNRS TOULOUSE FRANCE.

3D Colored Model Generation Based on Multiview Textures and Triangular Mesh

Super-Resolution Keyframe Fusion for 3D Modeling with High-Quality Textures

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Improved 3D Reconstruction using Combined Weighting Strategies

Image-Based Rendering

ICP and 3D-Reconstruction

Exploiting Depth Camera for 3D Spatial Relationship Interpretation

A Systems View of Large- Scale 3D Reconstruction

A consumer level 3D object scanning device using Kinect for web-based C2C business

Dual Back-to-Back Kinects for 3-D Reconstruction

arxiv: v1 [cs.cv] 28 Sep 2018

Large Scale 3D Reconstruction by Structure from Motion

Surface reconstruction Introduction. Florent Lafarge Inria Sophia Antipolis - Mediterranee

NICP: Dense Normal Based Point Cloud Registration

Adaptive Point Cloud Rendering

Compact and Accurate 3-D Face Modeling Using an RGB-D Camera: Let s Open the Door to 3-D Video Conference

3D Scanning. Lecture courtesy of Szymon Rusinkiewicz Princeton University

CS 563 Advanced Topics in Computer Graphics QSplat. by Matt Maziarz

THREE PRE-PROCESSING STEPS TO INCREASE THE QUALITY OF KINECT RANGE DATA

Point based Rendering

Structured light 3D reconstruction

Hierarchical Volumetric Fusion of Depth Images

Moving Object Detection by Connected Component Labeling of Point Cloud Registration Outliers on the GPU

3D Reconstruction with Tango. Ivan Dryanovski, Google Inc.

Processing 3D Surface Data

Replacing Projective Data Association with Lucas-Kanade for KinectFusion

3D Environment Reconstruction

3D Graphics for Game Programming (J. Han) Chapter II Vertex Processing

From Structure-from-Motion Point Clouds to Fast Location Recognition

CITS 4402 Computer Vision

Robust Registration of Narrow-Field-of-View Range Images

TEXTURE OVERLAY ONTO NON-RIGID SURFACE USING COMMODITY DEPTH CAMERA

Direct Rendering of Trimmed NURBS Surfaces

A Real-Time RGB-D Registration and Mapping Approach by Heuristically Switching Between Photometric And Geometric Information

Real-Time Plane Segmentation and Obstacle Detection of 3D Point Clouds for Indoor Scenes

Eye Contact over Video

Spring 2009 Prof. Hyesoon Kim

Advances in 3D data processing and 3D cameras

Spring 2011 Prof. Hyesoon Kim

Scan Matching. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

Depth Sensors Kinect V2 A. Fornaser

3D Video Over Time. Presented on by. Daniel Kubacki

Direct Methods in Visual Odometry

RGB-D Object Modelling for Object Recognition and Tracking

Sampling, Aliasing, & Mipmaps

3D Scanning. Qixing Huang Feb. 9 th Slide Credit: Yasutaka Furukawa

IMAGE-BASED RENDERING

Project report Augmented reality with ARToolKit

Adaptive Image Sampling Based on the Human Visual System

Shape from Silhouettes I CV book Szelisky

Toward Online 3-D Object Segmentation and Mapping

Processing 3D Surface Data

Supplementary Material for A Multi-View Stereo Benchmark with High-Resolution Images and Multi-Camera Videos

Kinect Stream Recording, Point Cloud Extraction and Point Cloud Registration Report

Transcription:

Mobile Point Fusion Real-time 3d surface reconstruction out of depth images on a mobile platform Aaron Wetzler Presenting: Daniel Ben-Hoda Supervisors: Prof. Ron Kimmel Gal Kamar Yaron Honen Supported by grant 267414

Motivation Dense 3D Reconstruction is a fundamental problem in the field of computer vision. Currently there are limited solutions for real time dense 3D reconstruction on mobile devices. Dense 3D Reconstruction on mobile devices can be used to create 3D scanners and Augmented Reality apps for SmB s and personal uses. Depth Cameras are becoming cheaper, and more readily available, and are becoming available in mobile devices. Mobile devices graphical and computational power is increasing. Mobile Point Fusion = Dense 3D reconstruction from depth camera in real time on mobile Devices

System Setup Structure depth sensor attached to an ipad Retina

The UI Rendered model RGB Depth Normals

Pipeline and System Overview The pipeline continuously receives depth images from the sensor Using the global model and the new data the camera pose is estimated The global model is updated with the new data The updated global model is rendered to the screen from the new point of view

Model Representation Our system adopts a purely point-based representation. This representation eliminates the need for spatial and structural data structures Our global model is a simple point cloud Points are assumed to be oriented circular disks Each point is allocated a unique ID upon first encounter The ID corresponds to the index of the OpenGL primitive GL_POINT representing the point The system recognizes previously encountered points using these ID s Each time a point is encountered, the system s confidence in that point is increased The system distinguishes between stable and unstable model points according to that points confidence

Model Attributes The model data structure is maintained on the GPU. Each point is represented by the following attributes: ID: i N + Position: v i R 3 Normal: n i R 3 Normal Radius: r i R Confidence: c i R - a confidence counter Points with c i c stable are considered stable Timestamp: t i N - the last iteration in which this point was modified

Depth Image Preprocessing Depth images are filtered to remove sensor noise For each depth measurement the following is computed: 3D position Normal Radius Input depth images Filtering 3D Position Extraction Normal Estimation Radius Estimation Camera pose estimation

Bilateral Filtering Depth images are filtered using a Gaussian Bilateral Filter to remove sensor noise For each pixel distance possible in the filter box, Gaussian values are pre-calculated Gaussian values are also pre-calculated for the range differences of 0-256

Bilateral Filtering Raw depth image Filtered Depth image

Computing Camera-Space Position Given the supplied OpenGL projection matrix, P, that simulates the projection of the camera onto the image plane, the reverse OpenGL projection processes is performed

Normal Estimation Surface normals are estimated using neighboring pixels by calculating central differences d x = p 8 c p 2 c d y = p 6 c p 4 c n p5 c = d x d y

Radii Estimation Comparing solid angled provides an upper limit to the radius of the object which was projected Radii are reduced over time as more detailed measurement are acquired therefore the radii are set to their respective upper limits δa = cos (α) cos (θ) z f 2 δi

Depth Image Preprocessing Attributes of new points are saved into textures for use at later stages Depth image Positions Normals + radius

Camera Pose Estimation The current 6 DOF camera pose is estimated by aligning the new depth image and the global model as viewed from the last known position of the camera The alignment is preformed using Iterative Closest Point (ICP) which finds the 6 DOF transformation between two point clouds by finding correspondences and computing the transformation which minimizes a chosen error metric Collect visible stable model points Find correspondences Solve linear least squares Apply the transformation to the model

ICP Preprocessing Stable model points are rendered from the last known point of view, the ID s of these points are saved in a texture Using this texture, the ID s of the visible stable model points are collected for further rendering 84 84 321 72 321 72 625 214 625 214

Finding Correspondences Correspondences are found using projective association Visible stable model points previously collected are projected onto the new depth image If a model point is projected onto a pixel containing a depth measurement a correspondence is found Correspondences are rejected if The angle between normals is greater than δ normal The Euclidean distance is greater than δ distance

Finding the Transformation The desired 6 DOF transformation is one that minimizes a chosen error metric In order to accelerate convergence the point to plane error metric is chosen Computing the desired transformation requires solving a non-linear least squares problem The non-linear least squares problem is approximated by a linear one to reduce computation time

Applying the Transformation The computed transformation is applied to the model and the process is repeated until a sufficient accuracy is reached Empirical data shows that the convergence of the point to plane ICP variant is not monotonous, therefore correspondences are found every several iterations

Updating the Global Model New data is fused into the global model, once the global model and the new depth image are aligned The model is updated using the following steps The model is rendered in high resolution Point fusion Point Removal

Super-sampled Rendering The model s visible points ID s and attributes are rendered to textures to be used in the following steps Each point is rendered into a single texel to reveal the actual surface sample distribution The model is rendered at 16 times the depth sensor s native resolution Each of the camera s pixels corresponds to a 4X4 neighborhood in the super-sampled textures

Point Fusion For each new point a single model point is matched The matching point is chosen from the corresponding 4X4 neighborhood: Discard points larger than ±δ depth distance from the viewing ray, adjusted according to sensor uncertainty Discard points whose normals have an angle larger than δ normal From the remaining points select the points with the highest confidence From the remaining points select the point closest to the viewing ray

Point Fusion The matching points attributes are averaged according to their respective confidence values If no match is found, the point is treated as a new point and added to the model The CPU reads the averaged values and updates the global model Fusion: Green points are stable

Point Removal In addition to the merging and adding of new points, under certain conditions model points need to be removed Points that remain unstable for a long period of time are likely outliers or artifacts from moving objects and therefore need to be removed from the model Stable model points that were merged with new data have their position adjusted, therefore: Model points that lie in front of these points from the current point of view are removed as these are free space violations Neighboring points that have very similar position, normal and overlapping spheres are redundant and are removed to further simplify the model

Point Removal To identify the points for removal, the super-sampled textures are used For each stable model point that was merged with new data the corresponding 4x4 neighborhood is searched for removal candidates The ID s of points that need to be removed are rendered to texture The model is rendered using index rendering throughout the pipeline. Therefore each time the model is rendered a list of indices, the vertex ID s, is passed to the GPU Removing a point from the model can be executed by simply removing the point s ID from the list of model indices

Point Removal To prevent fragmentation in the model memory, the following method is employed: Each time a point is removed it s index is inserted to a FIFO queue When a new point needs to be added to the model, it is written to the memory corresponding to the index at the head of the queue If the queue is empty the point is inserted after the last model point in the memory

Comparison to Kinect Fusion Opposed to Kinect Fusion which employed a volumetric data structure, Mobile Point Fusion uses a purely point based representation throughout the pipeline Enabling scanning of large scenes Resolution is limited by the depth sensor only Computation time is less dependent on model size and resolution Eliminates the overhead of converting between representations Does not create a watertight 3D model (but this can be later generated offline)

Results

Accomplishments and Unresolved Issues Mobile Point Fusion has demonstrated how point based fusion can be efficiently preformed on mobile devices. The main issue is the ICP correspondences. In order to adapt the system to the mobile platform the model representation had to be modified. This simpler representation which uses the GL_POINT primitive has the drawback that for each point the depth of all fragments is equal. This inaccuracy might be the cause of the ICP correspondence failure. OpenGL ES 3.0 limits the use of floating point textures to 16 bit components. This accuracy limit has caused various numerical issues.

Future Work Performance optimizations: Utilize the GPU to perform bilateral filtering Avoid expensive CPU texture reads in the point removal stages Utilize mip mapping to accelerate the pipeline in various stages. Implement the Dynamic Estimation stage Integrate RGB data into the pipeline Offload old model data to HDD for later use

Bibliography [1] Maik Keller, Damien Lefloch, Martin Lambers, Shahram Izadi, Tim Weyrich, Andreas Kolb. Real-time 3D Reconstruction in Dynamic Scenes using Point-based Fusion. In 2013 International Conference on 3D Vision [2] C. Tomasi, R. Manduchi. Bilateral filtering for gray and color images. In Proceedings of the 1998 IEEE International Conference on Computer Vision, pages 839-846 [3] Kok-Lim Low. Linear Least-Squares Optimization for Point-to-Plane ICP Surface Registration. In Technical Report TR04-004, Department of Computer Science, University of North Carolina at Chapel Hill, February 2004 [4] Szymon Rusinkiewicz, Marc Levoy. Efficient Variants of the ICP Algorithm, 2001 [5] Shahram Izadi, David Kim, Otmar Hilliges, David Molyneaux, Richard A. Newcombe, Pushmeet Kohli, Jamie Shotton, Steve Hodges, Dustin Freeman, Andrew J. Davison, Andrew Fitzgibbon. KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera, 2006 [6] G. Guennebaud, M. Paulin. Efficient Screen Space Approach for Hardware Accelerated Surfel Rendering, November 2003 [7] Thibaut Weise, Thomas Wismer, Bastian Leibe, Luc Van Gool. In-Hand Scanning with Online Loop Closure, 2009 [8] M. Lindenbaum, 236873 Computer Vision Technion course, 2014 [9] https://www.opengl.org/wiki/compute_eye_space_from_window_space, Compute eye space from window space [10] http://www.songho.ca/opengl/gl_projectionmatrix.html, OpenGL projection Matrix

Thank You