Fast Natural Feature Tracking for Mobile Augmented Reality Applications

Similar documents
Video Processing for Judicial Applications

arxiv: v1 [cs.cv] 1 Jan 2019

Leow Wee Kheng CS4243 Computer Vision and Pattern Recognition. Motion Tracking. CS4243 Motion Tracking 1

Disparity Search Range Estimation: Enforcing Temporal Consistency

Image processing and features

Chapter 3 Image Registration. Chapter 3 Image Registration

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

SURF: Speeded Up Robust Features. CRV Tutorial Day 2010 David Chi Chung Tam Ryerson University

Optical flow and tracking

Available online at ScienceDirect. Procedia Computer Science 22 (2013 )

Aircraft Tracking Based on KLT Feature Tracker and Image Modeling

A REAL-TIME REGISTRATION METHOD OF AUGMENTED REALITY BASED ON SURF AND OPTICAL FLOW

Augmenting Reality, Naturally:

Implementation and Comparison of Feature Detection Methods in Image Mosaicing

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

Outline. Introduction System Overview Camera Calibration Marker Tracking Pose Estimation of Markers Conclusion. Media IC & System Lab Po-Chen Wu 2

Motion and Optical Flow. Slides from Ce Liu, Steve Seitz, Larry Zitnick, Ali Farhadi

Visual Tracking (1) Tracking of Feature Points and Planar Rigid Objects

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Feature Tracking and Optical Flow

Feature Tracking and Optical Flow

COMPUTER VISION > OPTICAL FLOW UTRECHT UNIVERSITY RONALD POPPE

Visual Tracking (1) Pixel-intensity-based methods

Examination of Hybrid Image Feature Trackers

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

Using temporal seeding to constrain the disparity search range in stereo matching

Local Image Registration: An Adaptive Filtering Framework

A Comparison of SIFT, PCA-SIFT and SURF

EASY PROJECTOR AND MONOCHROME CAMERA CALIBRATION METHOD USING PLANE BOARD WITH MULTIPLE ENCODED MARKERS

Face Recognition using SURF Features and SVM Classifier

Marcel Worring Intelligent Sensory Information Systems

Face Tracking : An implementation of the Kanade-Lucas-Tomasi Tracking algorithm

Peripheral drift illusion

A Comparison of SIFT and SURF

ACEEE Int. J. on Information Technology, Vol. 02, No. 01, March 2012

IMAGE-GUIDED TOURS: FAST-APPROXIMATED SIFT WITH U-SURF FEATURES

ECE Digital Image Processing and Introduction to Computer Vision

1-2 Feature-Based Image Mosaicing

arxiv: v1 [cs.cv] 28 Sep 2018

CS 378: Autonomous Intelligent Robotics. Instructor: Jivko Sinapov

K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors

Appearance-Based Place Recognition Using Whole-Image BRISK for Collaborative MultiRobot Localization

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline

International Journal Of Global Innovations -Vol.6, Issue.I Paper Id: SP-V6-I1-P01 ISSN Online:

Depth Propagation with Key-Frame Considering Movement on the Z-Axis

Robust Camera Pan and Zoom Change Detection Using Optical Flow

Corner Detection. Harvey Rhody Chester F. Carlson Center for Imaging Science Rochester Institute of Technology

Ensemble of Bayesian Filters for Loop Closure Detection

Object Recognition with Invariant Features

3D Visualization through Planar Pattern Based Augmented Reality

International Journal of Advanced Research in Computer Science and Software Engineering

EE795: Computer Vision and Intelligent Systems

Overview. Augmented reality and applications Marker-based augmented reality. Camera model. Binary markers Textured planar markers

Finally: Motion and tracking. Motion 4/20/2011. CS 376 Lecture 24 Motion 1. Video. Uses of motion. Motion parallax. Motion field

Scale Invariant Feature Transform

A Novel Real-Time Feature Matching Scheme

The SIFT (Scale Invariant Feature

A hardware design of optimized ORB algorithm with reduced hardware cost

Detecting motion by means of 2D and 3D information

SURF. Lecture6: SURF and HOG. Integral Image. Feature Evaluation with Integral Image

Autonomous Navigation for Flying Robots

Open Access Moving Target Tracking Algorithm Based on Improved Optical Flow Technology

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

III. VERVIEW OF THE METHODS

Visual Tracking (1) Feature Point Tracking and Block Matching

LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS

Real-Time Scene Reconstruction. Remington Gong Benjamin Harris Iuri Prilepov

CSE 252A Computer Vision Homework 3 Instructor: Ben Ochoa Due : Monday, November 21, 2016, 11:59 PM

FAST-MATCH: FAST AFFINE TEMPLATE MATCHING

Local Feature Detectors

Comparison between Motion Analysis and Stereo

Determinant of homography-matrix-based multiple-object recognition

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

Calibration of a Different Field-of-view Stereo Camera System using an Embedded Checkerboard Pattern

A Novel Extreme Point Selection Algorithm in SIFT

Face Tracking. Synonyms. Definition. Main Body Text. Amit K. Roy-Chowdhury and Yilei Xu. Facial Motion Estimation

Research and application of volleyball target tracking algorithm based on surf corner detection

An Approach for Real Time Moving Object Extraction based on Edge Region Determination

Scale Invariant Feature Transform

Motion. 1 Introduction. 2 Optical Flow. Sohaib A Khan. 2.1 Brightness Constancy Equation

A Facade Tracking System for Outdoor Augmented Reality

Dominant plane detection using optical flow and Independent Component Analysis

Motion Estimation. There are three main types (or applications) of motion estimation:

Capturing, Modeling, Rendering 3D Structures

Image Mosaic with Rotated Camera and Book Searching Applications Using SURF Method

A Robust Two Feature Points Based Depth Estimation Method 1)

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

An Algorithm for Seamless Image Stitching and Its Application

Image-Based Deformation of Objects in Real Scenes

arxiv: v1 [cs.cv] 2 May 2016

Stacked Integral Image

Fast Simultaneous Tracking and Recognition Using Incremental Keypoint Matching

A robust method for automatic player detection in sport videos

Fast Image Matching Using Multi-level Texture Descriptor

Motion Estimation and Optical Flow Tracking

Lecture 16: Computer Vision

GPU Accelerating Speeded-Up Robust Features Timothy B. Terriberry, Lindley M. French, and John Helmsen

Time-to-Contact from Image Intensity

Stereoscopic Images Generation By Monocular Camera

Occlusion Detection of Real Objects using Contour Based Stereo Matching

Transcription:

Fast Natural Feature Tracking for Mobile Augmented Reality Applications Jong-Seung Park 1, Byeong-Jo Bae 2, and Ramesh Jain 3 1 Dept. of Computer Science & Eng., University of Incheon, Korea 2 Hyundai MnSoft, Inc., Seoul, Korea 3 Dept. of Computer Science, University of California, Irvine, CA, USA Abstract. Fast natural feature tracking is essential to make markerless augmented reality applications practical on low performance mobile devices. To speed up the natural feature tracking process which includes computationally expensive procedures, we propose a novel fast tracking method using optical flow aimed for mobile augmented reality applications. Experimental results showed that the proposed method significantly reduces the computational cost and also stabilizes the camera pose estimation process. Keywords: Tracking, Natural feature, Augmented reality, Optical flow 1 Introduction Natural features are features from unprepared scenes. Natural feature tracking (NFT) requires a lot of computationally expensive operations. Most previous natural feature tracking methods include heavy feature extraction and pattern matching procedures for each of the input image frame [1]. Classical NFT approaches repeat the feature extraction and matching procedures for each input frame. They try to match the extracted features from the input frame against each of the registered patterns until a successful pattern is found. The feature extraction and matching procedures require heavy computational cost which is hard to be allowed on low performance mobile devices. To speed up the NFT process, we propose a novel fast tracking method using feature-based optical flow. We implemented the proposed method on mobile devices to run in realtime so that it can be viably used with mobile augmented reality applications. Moreover, during tracking, we keep the total number of feature points constant by inserting new feature points proportional to the number of vanished feature points. The basic principle of speeding up the tracking process is to call the feature extraction and matching procedures less often and also restrict the area to extract new features to small portioned subregions. Once part of the input frame is matched to a specific pattern, we initiate tracking of the matched features through the successive video frames. As long as the tracking is successful, we do not perform new feature extraction and matching procedures. - 273 -

2 Tracking Natural Features Feature tracking is the process of finding corresponding positions on the successive image to image points on the first image. The measure of correspondence is based on the similarity on an image neighborhood of a fixed size window. A wellknown method is the feature tracker developed by Lucas and Kanade [2]. Let I and J be two consecutive grayscale images, and their scalar quantities I(x, y) and J(x, y) be the pixel intensities at image coordinates (x, y). The consecutive image frame J contains most parts of the first image frame I. The position (x, y) in I will be moved to a new position (x + d x,y + d y )inj. The tracker must determine the disparity vector (d x,d y )at(x, y) from the intensity similarity of I and J. The similarity criterion is measured from the set of local neighborhood pixels, denoted by W, centered at the position (x, y). The disparity is commonly computed by minimizing the residual error due to the brightness differences. This approach stably tracks small feature movements in video frames taken in rapid succession, but it is still not reliable enough when the feature movements between frames are farther apart in time. Bay [3] proposed an interest point detector and descriptor called Speeded Up Robust Features (SURF) to reduce the time for feature computation and matching. They approximate second order Gaussian derivatives in the Hessian- Laplace detector with box filters, which can be evaluated very fast using integral images. SURF is currently regarded as a potentially more efficient descriptor then the previous descriptors such as SIFT and Fern. Howenver, the known NFT methods are still too slow to track features on a live video. It is even worse when using it on a mobile device. To speed up tracking, several approaches put strict restrictions on the target object to be tracked. Some mobile AR applications utilize ARToolKit-like approaches, which track only black and white markers. However, such marker-based approaches are not preferable especially for outdoor AR applications since they significantly restrict service domains for the sake of speedup. Feature tracking is a necessary preprocessing step of the problem of structure from motion which finds the 3D structure of captured scenes from images sensed over time. Because the feature matches are the only preliminary information for further vision-based inference, conventional point-based tracking schemes try to seek as many feature points as possible. Most previous schemes of natural feature tracking have focused on the description and matching of features between two consecutive images. Their methods extract a new set of point features from each of the newly appeared image, instead of considering previous matched features. The extraction and matching of a new point set is time-consuming and should be avoided especially when the method is used for a real-time application. Our claim is to utilize the previous matched features to speed up the tracking process. In vision-based augmented reality applications the purpose of natural feature tracking is to compute a homography between a planar scene and a projected image. To ensure the existence of a pattern, there must be a large number of feature points for the planar scene pattern and also enough number of the feature points - 274 -

must be matched to points in the projected image. To identify the rectangular region of the projected pattern, a homography is computed from the matched point pairs. An application utilizes the homography for further service-specific processing. Conventional feature tracking approaches find correspondences between two consecutive images, namely I t and I t+1. They extract an initial set of feature points from the first frame and track their movements along the consecutive image frames. Contrary to the conventional tracking approaches, the natural feature tracking approach in AR domains finds the correspondences between an on-the-fly image and one of pre-registered patterns. Previous methods of natural feature tracking have been focused on the matching of two point sets and they newly extract feature points on each of the on-the-fly image. 3 Speeding Up Feature Tracking We found there are several cues to speed up the tracking process. First, good features will not be lost during tracking until they disappear from the field of view. This cue indicates that we do not have to extract features from each frame. The tracked feature positions at the next frame are likely to be detected as new feature locations. We can just use the tracked feature locations instead of finding new feature locations. Second, it is not necessary to track a huge number of feature points to compute a homography. We can reduce the feature matching time by limiting the number of features to be matched. It is theoretically possible to compute a homography from only four correspondences. Practically twenty points are enough to enforce robustness. Third, we can predict whether each feature point will be disappeared soon or not. This cue means that we can efficiently manage the set of feature points to be tracked. We can exclude feature points which will be disappeared soon and include new points which will stay for a long time. Based on the three cues, we invoke the feature extraction step less often with a fewer number of features. The Lucas-Kanade feature tracker is a widely used feature tracking method [2]. It estimates the optical flow of each feature pixel by assuming that the flow is constant in a local neighborhood of the pixel. Bouguet [4] proposed a fast pyramidal implementation of the iterative Lucas-Kanade feature tracker [2]. The advantage of his pyramidal implementation is that each residual disparity vector in a hierarchal level can be kept very small. This allows large pixel motions and fast approximation of the iterative tracker. A classical NFT method has three main steps as shown in Fig. 1. First, it acquires a new image frame and extracts features. Then, it performs feature matching between the extracted features and the features of predefined patterns. As the result of matching it gets the matched pairs of feature points. Finally, it estimates camera poses and performs image synthesis for rendering. The feature extraction procedure is time-consuming since it contains several convolution operators which inspect every pixel position. The extraction procedure is likely to extract excessive number of features to avoid any failure in matching. To make - 275 -

Image acquisition & Feature extraction Bootstrap pattern matching Pattern matching Image acquisition & Feature tracking Camera pose estimation & Image synthesis Camera pose estimation & Image synthesis Fig. 1. Comparison classical methods (left) and the proposed method (right). matters worse, the feature extraction procedure is repeated for each input frame to extract a new set of feature positions. The feature matching process must be repeated for each of the registered patterns. It means that the matching step becomes slower when the number of patterns is increased. It is not appropriate to apply this classical NFT approach to low performance mobile devices. Instead, we need a new fast method with reduced overhead in feature extraction and matching. The basic idea of our proposed method is to reduce the number of features by excluding unnecessary features. In our proposed method, as shows in Fig. 1, we perform the pattern matching procedure only once and track the matched features for the next consecutive frames. Initially, we extract and describe features using a scale- and rotation-invariant interest point detector. The feature matching procedure tries to match feature vectors against each of the predefined patterns. Once a matched pattern is found, we track the matched feature points on the next consecutive frames. Our tracking implementation is based on the pyramidal scheme of the Lucas-Kanade feature tracker. 4 Experimental Results Experimental results showed that the proposed method significantly reduces the computational cost and also stabilizes the camera pose estimation process. We captured a scene containing a specific pattern plate. While capturing the pattern plate, the camera has been rotated, zoomed, and tilted and, hence, the captured images of the pattern plate are changed according to the camera motion. We tested the accuracy of our algorithm when tracking the pattern in the captured frames. While tracking, we evaluated the homography between the input frame and the matched pattern using the matched pairs. The accuracy is measured by the sum of differences of the evaluated corner positions from the homography and the actual corner positions which are manually specified. In the SURF method - 276 -

Table 1. Comparison of the processing time for a single video frame (in ms). method #patterns capture extract match track render total 1 1.053 193.940 9.902-2.194 207.089 SURF 5 1.063 194.035 71.775-2.409 269.282 10 1.052 192.756 218.301-2.409 414.518 1 1.042 1.168 0.061 25.635 1.316 29.222 Proposed 5 1.032 1.633 0.492 27.764 1.339 32.260 10 1.050 1.176 1.231 25.851 1.334 30.642 [3] there are abrupt increases of error in some frames. In the proposed method the error is stable throughout the input stream. We compared the processing time on the three platforms between SURF and our proposed method. Table 1 shows the results of comparison. In our test, the SURF method takes at least 35 ms per each frame. The feature extraction step takes more than half of the total processing time. In the proposed method, the feature extraction and matching time is significantly reduced. 5 Conclusions The heavy computational burden of classical NFT approaches prohibits the runtime execution of NFT on the low performance mobile devices. To speed up NFT, we proposed the fast feature tracking based on the optical flow. The proposed algorithm is feasible to track natural features in unknown and time varying outdoor environments. To guarantee continuity in tracking without increasing the time complexity we introduced the partial feature extraction and matching in image subregions. As long as the tracking is successful, further feature extraction and matching procedures are partially performed only on the subregions in which no features are contained. Experimental results from real data set showed that the proposed method is more than 7 times faster than the SURF method. The method not only shows the significant speed-up but also maintains at the same level of accuracy. References 1. Lim, M.J., Jung, H.W., Lee, K.Y.: Game-type recognition rehabilitation system based on augmented reality through object understanding. The Journal of the Institute of Webcasting, Internet and Telecommunication 11(3) (2011) 93 98 2. Lucas, B., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of the 7th International Joint Conference on Artificial Intelligence. (1981) 674 679 3. Bay, H., Ess, A., Tuytelaars, T., Gool, L.V.: Surf: Speeded up robust features. Computer Vision and Image Understanding 110(3) (2008) 346 359 4. Bouguet, J.: Pyramidal implementation of the Lucas Kanade feature tracker. Technical report, Intel Corporation Microprocessor Research Labs (2000) - 277 -