Real-Time Scene Reconstruction. Remington Gong Benjamin Harris Iuri Prilepov

Similar documents
Computer Vision Lecture 20

Computer Vision Lecture 20

Towards the completion of assignment 1

Leow Wee Kheng CS4243 Computer Vision and Pattern Recognition. Motion Tracking. CS4243 Motion Tracking 1

Multi-stable Perception. Necker Cube

Using temporal seeding to constrain the disparity search range in stereo matching

Autonomous Navigation for Flying Robots

Capturing, Modeling, Rendering 3D Structures

Image processing and features

CS201: Computer Vision Introduction to Tracking

1-2 Feature-Based Image Mosaicing

Dense 3D Reconstruction. Christiano Gava

Face Tracking : An implementation of the Kanade-Lucas-Tomasi Tracking algorithm

Feature Tracking and Optical Flow

CS664 Lecture #18: Motion

Feature Tracking and Optical Flow

Computer Vision Lecture 20

Matching. Compare region of image to region of image. Today, simplest kind of matching. Intensities similar.

Dense 3D Reconstruction. Christiano Gava

Visual Tracking (1) Feature Point Tracking and Block Matching

arxiv: v1 [cs.cv] 2 May 2016

Peripheral drift illusion

Optical flow and tracking

EECS 442: Final Project

Factorization with Missing and Noisy Data

Stereo and Epipolar geometry

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation

calibrated coordinates Linear transformation pixel coordinates

CS 231A: Computer Vision (Winter 2018) Problem Set 2

Kanade Lucas Tomasi Tracking (KLT tracker)

CS231A Course Notes 4: Stereo Systems and Structure from Motion

Motion and Optical Flow. Slides from Ce Liu, Steve Seitz, Larry Zitnick, Ali Farhadi

CS 565 Computer Vision. Nazar Khan PUCIT Lectures 15 and 16: Optic Flow

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

Factorization Method Using Interpolated Feature Tracking via Projective Geometry

Application questions. Theoretical questions

EE795: Computer Vision and Intelligent Systems

Stereo Vision. MAN-522 Computer Vision

COMPUTER VISION > OPTICAL FLOW UTRECHT UNIVERSITY RONALD POPPE

arxiv: v1 [cs.cv] 28 Sep 2018

Parallel algorithms to a parallel hardware: Designing vision algorithms for a GPU

Realtime Affine-photometric KLT Feature Tracker on GPU in CUDA Framework

Computational Optical Imaging - Optique Numerique. -- Single and Multiple View Geometry, Stereo matching --

IMPACT OF SUBPIXEL PARADIGM ON DETERMINATION OF 3D POSITION FROM 2D IMAGE PAIR Lukas Sroba, Rudolf Ravas

Parallel SIFT-detector implementation for images matching

Large Displacement Optical Flow & Applications

CS 4495 Computer Vision Motion and Optic Flow

Lecture 10: Multi view geometry

Two-view geometry Computer Vision Spring 2018, Lecture 10

CS231M Mobile Computer Vision Structure from motion

Video Mosaics for Virtual Environments, R. Szeliski. Review by: Christopher Rasmussen

A Summary of Projective Geometry

Augmenting Reality, Naturally:

Static Scene Reconstruction

Notes 9: Optical Flow

Tracking Computer Vision Spring 2018, Lecture 24

Live Metric 3D Reconstruction on Mobile Phones ICCV 2013

DIGITAL. Computer Vision Algorithms for. Automating HD Post-Production. Institute of Information and Communication Technologies

Visual Tracking (1) Tracking of Feature Points and Planar Rigid Objects

Two-View Geometry (Course 23, Lecture D)

Fast Natural Feature Tracking for Mobile Augmented Reality Applications

3D Environment Measurement Using Binocular Stereo and Motion Stereo by Mobile Robot with Omnidirectional Stereo Camera

Kanade Lucas Tomasi Tracking (KLT tracker)

OPPA European Social Fund Prague & EU: We invest in your future.

Motion Estimation. There are three main types (or applications) of motion estimation:

Tracking in image sequences

Structure from Motion. Lecture-15

Optical Flow Estimation with CUDA. Mikhail Smirnov

Image Coding with Active Appearance Models

Visual Tracking (1) Pixel-intensity-based methods

Perception and Action using Multilinear Forms

Announcements. Stereo

Performance Enhancement of Hausdorff Metric Based Corner Detection method using GPU

Segmentation and Tracking of Partial Planar Templates

School of Computing University of Utah

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

3D Computer Vision. Structure from Motion. Prof. Didier Stricker

Step-by-Step Model Buidling

Structure from Motion

Visual motion. Many slides adapted from S. Seitz, R. Szeliski, M. Pollefeys

Aircraft Tracking Based on KLT Feature Tracker and Image Modeling

Local Feature Detectors

Motion. 1 Introduction. 2 Optical Flow. Sohaib A Khan. 2.1 Brightness Constancy Equation

Available online at ScienceDirect. Procedia Computer Science 22 (2013 )

Local Image Registration: An Adaptive Filtering Framework

A CUDA-enabled KLT Tracker for High Definition Images

Lecture 16: Computer Vision

Lecture 16: Computer Vision

Accelerated Corner-Detector Algorithms

Epipolar geometry. x x

Particle Tracking. For Bulk Material Handling Systems Using DEM Models. By: Jordan Pease

Multiview Stereo COSC450. Lecture 8

Ninio, J. and Stevens, K. A. (2000) Variations on the Hermann grid: an extinction illusion. Perception, 29,

Computational Optical Imaging - Optique Numerique. -- Multiple View Geometry and Stereo --

2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into

Visual Odometry for Non-Overlapping Views Using Second-Order Cone Programming

Gain Adaptive Real-Time Stereo Streaming

The SIFT (Scale Invariant Feature

Machine vision. Summary # 11: Stereo vision and epipolar geometry. u l = λx. v l = λy

Transcription:

Real-Time Scene Reconstruction Remington Gong Benjamin Harris Iuri Prilepov June 10, 2010

Abstract This report discusses the implementation of a real-time system for scene reconstruction. Algorithms for feature extraction, tracking, structure from motion and mesh generation are introduced. We present experimental results demonstrating the performance of the KLT feature tracker running on GPU. The report touches on the eight point algorithm for structure from motion and a solution to mesh generation, analyzing their fitness for a real-time system.

Contents 1 Introduction............................ 2 2 KLT Feature Tracking on GPU................. 3 2.1 Background........................ 3 2.2 Implementation...................... 4 2.3 Results.......................... 4 3 Eight point algorithm...................... 6 4 Mesh Generation......................... 8 5 Conclusions............................ 9 Bibliography 10 1

1 Introduction Our team took on the challenge to create an application for capturing and recreating environments from photographs. Though the challenge was to reconstruct objects from photographs, we opted to use video sequences instead, and attempted to build a system to recreate 3D model of an environment in real-time. The problem can be broken up into three parts. The first step is to detect feature points in a frame and then track them in consecutive frames. Then numeric linear algebra techniques are used to infer the depth of the points in each camera view and generate a point cloud of a 3D model. Lastly, based on the point cloud an attempt is made to generate the best fitting surface. This is a long-term project that hasn t been completed. Currently, only the first step - feature tracking - is fully functional and exhibits real-time performance. The first chapter discusses the feature tracking on GPU, second chapter will briefly discuss the eight point algorithm used to solve for the point cloud and our current state of the implementation. The last chapter describes what mesh generation is and how we would go about implementing it. 2

2 KLT Feature Tracking on GPU Feature tracking is the first major stage of 3D scene reconstruction. Given a sequence of 2D images, a feature tracker identifies 2D points in each pair of adjacent frames that correspond to the same real-world features. The features can have any type of geometry but in the application of scene reconstruction points are convenient so that the output of the tracker can be used to generate a 3D point cloud using structure-from-motion algorithms. 2.1 Background There are many ways to go about tracking features across image frames, but the most common method for real-time applications is the KLT feature tracking algorithm. This algorithm, developed by Kanade, Lucas, and Tomasi in [4, 7], uses a simple motion model for an image sequence: it assumes the camera moves approximately linearly between adjacent frames. This assumption greatly simplifies the mathematical representation of the problem. We model the input as a sequence of functions I 1 ( x), I 2 ( x),..., I F ( x) that relate the color intensities of each of F frames to 2D pixel coordinates x = (x, y). The following function, defined with a slightly different formulation in [7], represents the constraint that the KLT tracking algorithm imposes on frames 1 i < F : I i ( x) = I i+1 ( x + δ i ( x)) + noise i ( x) The function δ i ( x) relates pixel coordinates in each frame to the linear displacement to the subsequent frame. The function noise i ( x) accounts for arbitrary noise in the intensity measurements of the camera. As Tomasi and Kanade point out in [7], if this factor is too large then the above constraint becomes extremely difficult to solve for, and the KLT algorithm will likely fail. The constraints now defined reduce the tracking problem to two simple steps. The first step selects N points from the first image, say x 1,1, x 1,2,..., x 1,N, that are suitable for tracking. In [7], Tomasi and Kanade show that points with high intensity changes in both x and y directions (i.e. corner points ) are the best features to select since the intensity change is used in the second step as the basis for tracking. To select these good features, gradients of the image are calculated in both x and y directions and the minimum eigenvalue of the 2 2 gradient matrix is used to rate each pixel s trackability. The N pixels with the highest trackability are chosen to be tracked. The second step solves δ i ( x) for each point in frame i, where 1 i < F. The location of each point in a subsequent frame can then be easily calculated as x i+1,j = δ i ( x i,j ), where 1 j N. The common approach to solve for the delta function uses the Newton-Rhapson iterative method to 3

minimize the error in the intensity difference between windows of pixels from each frame. If the linear motion model from the KLT assumption holds, this method quickly converges to the correct delta function. Putting these two steps together, it is possible to track N feature points across F image frames with only incremental computation. This property bodes well for real-time applications, which is likely why KLT tracking is the most common method of choice. 2.2 Implementation Considering the massive parallism in the above algorithm, we decided to develop a GPU implementation. Our first implementation attempt used the OpenCL API, but we later moved to NVIDIA s CUDA API. The implementations involved several components, most of which mapped well to the GPGPU arhitecture: 1. [GPU] Convolution filter to smooth images and calculate gradients. 2. [GPU] Minimum eigenvalue kernel to rate pixel trackability. 3. [CPU] Feature selector to find the best points to track. 4. [GPU] Feature tracker to solve for the displacement of feature points. Performance in the OpenCL version was limited by the convolution component since we did not consider GPU-specific optimizations. In the CUDA version, we used several optimizations such as loop unrolling, memory access coalescing, and caching in shared memory. These lower-level optimizations, more accessible in the CUDA API, offered significant performance gains. Another performance bottleneck was the copying of eigenvalues to the CPU for component 3, the sorting of eignevalues on the CPU, and the copying of feature points back to the GPU for component 4. Our final solution uses to a selection-heap-sort technique to only sort and copy data that is needed since most the pixels of the input image are thrown out before the tracking stage. We also considered several quality enhancments for the KLT algorithm. We calculated a pyramid of images from each input image, such that each level of the pyramid is one fourth the size of the previous level. With sufficient smoothing and subsampling these smaller pyramid levels provide the tracking component with initial estimates of the displacement function, allowing larger displacements to converge more quickly and avoiding convergence on local error minima. 2.3 Results Our resulting feature tracker attains reasonable real-time performance. On an NVIDIA 8600 GT laptop GPU, the tracker takes an average of 20ms 4

per frame to track 100 features in a 512 by 512 video sequence. With more powerful desktop cards we noticed as much as 10 times this performance, clearly allowing the possibility for more features and larger video resolutions. Figure 1: NVIDIA 8600M GT Total(ms) Ave(ms) Ave(%) : Name 25507 23.5 1.000: Frame 4789 4.4 0.188 : Frame Grabbing 10561 10.4 0.414 : Feature Tracking 10549 10.4 0.999 : Preprocessing 1657 1.6 0.157 : Copy to GPU 819 0.8 0.078 : Intensity Normalization 1610 1.6 0.153 : Smoothing 6426 6.3 0.609 : Pyramid 2001 2.0 0.311 : Downsampling 10143 9.4 0.398 : Rendering Figure 2: NVIDIA GTS 250 Total(ms) Ave(ms) Ave(%) : Name 2843 2.3 1.000: Frame 1454 1.2 0.511 : Frame Grabbing 1384 1.1 0.487 : Feature Tracking 1382 1.1 0.999 : Preprocessing 427 0.4 0.309 : Copy to GPU 133 0.1 0.096 : Intensity Normalization 181 0.1 0.131 : Smoothing 637 0.5 0.461 : Pyramid 189 0.2 0.297 : Downsampling 5

3 Eight point algorithm The eight-point algorithm(epa) is a method for recovering a 3D point clound from two views[1]. It is based on estimating the rotation and translate of the camera between the two views from the displacement of corresponding feature points in two frames. The notion of the fundamental matrix F is central to the EPA. For every pair of points the following equation must hold p T F p = 0 (1) where (p,p ) is a pair of corresponding feature points from two views x x p = y, p = y. 1 1 After rearranging the terms the equation (1) takes the form AF = 0, with x 1 x 1 x 1 y 1 x 1 y 1 x 1 y 1 y 1 y 1 x 1 y 1 1 A = x 2 x 2 x 2 y 2 x 2 y 2 x 2 y 2 y 2 y 2 x 2 y 2 1................... A least squares approximation of F can be found by computing a singular value decomposition of A = SUV T and taking the elements of F from the last column of V. Depending on the size of the matrix, it might be beneficial to compute the svd on GPU using the CULA library. From the fundamental matrix, another important entity called the essential matrix E can be extracted E = K T F K (2) where K is a 3x3 camera calibration matrix. Rotation and translation vectors can then be extracted from E and used in triangulation to find the 3D coordinates of the scene point. Current program to test the implementation of the code is made using the Windows SDL library for working with OpenGL and C++ TNT Jama linear algebra package for SVD decomposition. Further work will require some performance optimizations for this algorithm to be included in the real-time scene reconstruction system. Which frames from the video sequence to select as the view pairs for point cloud generation and how to register the partial point clouds that capture the same subsets of 3D points are some of the issues that need to be considered in the future. Aditionally, other, more robust algorithms are to be researched and analyzed for performance. 6

An attempt to reconstruct a cube has so far resulted in a following point cloud Figure 3: Test structure from motion output 7

4 Mesh Generation The first step was to get a 3D convex hull. We decided to use Timothy Chans 3D convex hull generator. His algorithm is a modified version of the divide-and-conquer algorithm. The approach is to look at the 3D convex hull problem as a kinetic 2D problem. To solve it as a kinetic 2D problem we must project the 3D points into 2D, find the convex hull, and then project the points back to 3D. Looking for the lower and upper hull is the same thing, so code need only be written for one half. The divide-and-conquer comes in for each of the hulls. Each half of the hull is split in half and the hull for each quarter is found. Then all the hulls are bridged together in the end. The input is a list of 3D points and the output is an array of events. The output array is basically represents the order in which the points are inserted into the convex hull. Here is a sample output from the algorithm. The input was a point cloud of a birdhouse. Somehow our understanding of what the algorithm outputs is not up to par. The points that are outputted should be in the set of points used as input. However, after looking at the output we find that the resulting points are not of the point cloud data. Figure 4: Test convex hull output 8

5 Conclusions We have achieved only partial success in the implementation of our system. However, after analyzing the algorithms used in each of the stages, we have come to a conclusion that the real-time scene reconstruction system is feasible. The obvious trade-off between performance and quality dictates that the quality of the 3D models will suffer. But judging by the rapid development of the research in scene reconstruction, we strongly believe that in the nearest future it will be possible to have a high quality real-time 3D scene reconstruction system running on conventional hardware. 9

Bibliography [1] Longuet-Higgins, H.C. (1981). A computer algorithm for reconstructing a scene from two projections. Nature, 293(10), 133-135. [2] Hartley, R. (1997). In defense of the eight-point algorithm. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(6), 580-593. [3] Hwangbo, M., Kim, J., & Kanade, T. (2009). Inertial-aided KLT feature tracking for a moving camera. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1909-1916. [4] Lucas, B., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In International Joint Conference on Artificial Intelligence, pp. 674-679. [5] Shi, J., & Tomasi, T. (1994). Good features to track. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 94), pp. 593-600. [6] Sinha, S. N., Frahm, J., Pollefeys, M., & Genc, Y. (2006). Feature tracking and matching in video using programmable graphics hardware. In Workshop on Edge Computing Using New Commodity Architectures. [7] Tomasi, C., & Kanade, T. (1991). Detecting and tracking of feature points. International Journal of Computer Vision, 9, 137-154. [8] Tomasi, C., Zhang, J., & Redkey, D. (1995). Experiments with a realtime structure-from-motion system. In Fourth International Symposium on Experimental Robots (ISER 95). Prepared in L A TEX 2ε by RCJ 10