252C: Camera Stabilization for the Masses

Similar documents
CSE 252B: Computer Vision II

arxiv: v1 [cs.cv] 28 Sep 2018

Exploitation of GPS-Control Points in low-contrast IR-imagery for homography estimation

A Summary of Projective Geometry

Homographies and RANSAC

arxiv: v1 [cs.cv] 28 Sep 2018

Chapter 3 Image Registration. Chapter 3 Image Registration

A Vision System for Automatic State Determination of Grid Based Board Games

Stereo Vision. MAN-522 Computer Vision

DETECTION AND ROBUST ESTIMATION OF CYLINDER FEATURES IN POINT CLOUDS INTRODUCTION

Image Mosaicing with Motion Segmentation from Video

Texture. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors

FACE RECOGNITION USING INDEPENDENT COMPONENT

Combining Appearance and Topology for Wide

A Method of Annotation Extraction from Paper Documents Using Alignment Based on Local Arrangements of Feature Points

ROBUST LEAST-SQUARES ADJUSTMENT BASED ORIENTATION AND AUTO-CALIBRATION OF WIDE-BASELINE IMAGE SEQUENCES

SIFT - scale-invariant feature transform Konrad Schindler

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

82 REGISTRATION OF RETINOGRAPHIES

Map-Enhanced UAV Image Sequence Registration and Synchronization of Multiple Image Sequences

Computing the relations among three views based on artificial neural network

Motion Tracking and Event Understanding in Video Sequences

A Real-time Algorithm for Atmospheric Turbulence Correction

Accurate Image Registration from Local Phase Information

Observations. Basic iteration Line estimated from 2 inliers

Computer Vision CSCI-GA Assignment 1.

A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM INTRODUCTION

LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS

EVALUATION OF SEQUENTIAL IMAGES FOR PHOTOGRAMMETRICALLY POINT DETERMINATION

Dense 3D Reconstruction. Christiano Gava

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

Islamic University of Gaza Faculty of Engineering Computer Engineering Department

BCC Optical Stabilizer Filter

Midterm Examination CS 534: Computational Photography

Motion. 1 Introduction. 2 Optical Flow. Sohaib A Khan. 2.1 Brightness Constancy Equation

Fast Outlier Rejection by Using Parallax-Based Rigidity Constraint for Epipolar Geometry Estimation

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Stereo and Epipolar geometry

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Hand-Eye Calibration from Image Derivatives

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline

Computer Vision I. Announcement. Corners. Edges. Numerical Derivatives f(x) Edge and Corner Detection. CSE252A Lecture 11

Segmentation and Tracking of Partial Planar Templates

MERGING POINT CLOUDS FROM MULTIPLE KINECTS. Nishant Rai 13th July, 2016 CARIS Lab University of British Columbia

CLUTCHING AND LAYER-SWITCHING: INTERACTION TECHNIQUES FOR PROJECTION-PHONE

Local Feature Detectors

AUTOMATIC RECTIFICATION OF LONG IMAGE SEQUENCES. Kenji Okuma, James J. Little, David G. Lowe

Final Review CMSC 733 Fall 2014

The SIFT (Scale Invariant Feature

Visualization 2D-to-3D Photo Rendering for 3D Displays

Action Classification in Soccer Videos with Long Short-Term Memory Recurrent Neural Networks

3-D TERRAIN RECONSTRUCTION WITH AERIAL PHOTOGRAPHY

Dense 3D Reconstruction. Christiano Gava

Marcel Worring Intelligent Sensory Information Systems

Scene Reconstruction from Uncontrolled Motion using a Low Cost 3D Sensor

ISSUES FOR IMAGE MATCHING IN STRUCTURE FROM MOTION

CHAPTER 1 INTRODUCTION

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

3D Visualization through Planar Pattern Based Augmented Reality

Sobel Edge Detection Algorithm

Multiple View Geometry. Frank Dellaert

Edge and corner detection

Comparison of Local Feature Descriptors

METRIC PLANE RECTIFICATION USING SYMMETRIC VANISHING POINTS

Video Alignment. Final Report. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Final Exam Study Guide

Deviceless respiratory motion correction in PET imaging exploring the potential of novel data driven strategies

Photo Tourism: Exploring Photo Collections in 3D

FAST-MATCH: FAST AFFINE TEMPLATE MATCHING

Real Time of Video Stabilization Using Field-Programmable Gate Array (FPGA)

TECHNIQUES OF VIDEO STABILIZATION

Dense Matrix Algorithms

FOREGROUND DETECTION ON DEPTH MAPS USING SKELETAL REPRESENTATION OF OBJECT SILHOUETTES

Motion and Optical Flow. Slides from Ce Liu, Steve Seitz, Larry Zitnick, Ali Farhadi

Feature Based Registration - Image Alignment

Subpixel accurate refinement of disparity maps using stereo correspondences

3D Corner Detection from Room Environment Using the Handy Video Camera

Image Processing

Mosaics. Today s Readings

3D Motion Segmentation Using Intensity Trajectory

An Algorithm for Seamless Image Stitching and Its Application

SURF applied in Panorama Image Stitching

School of Computing University of Utah

Classification of objects from Video Data (Group 30)

Optimised corrections for finite-difference modelling in two dimensions

Part 3: Image Processing

Why is computer vision difficult?

calibrated coordinates Linear transformation pixel coordinates

CS201: Computer Vision Introduction to Tracking

Comparison of Feature Detection and Matching Approaches: SIFT and SURF

Catadioptric camera model with conic mirror

ROBUST LINE-BASED CALIBRATION OF LENS DISTORTION FROM A SINGLE VIEW

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION

Perceptual Effects in Real-time Tone Mapping

Local Features Tutorial: Nov. 8, 04

AUTOMATED 3D RECONSTRUCTION OF URBAN AREAS FROM NETWORKS OF WIDE-BASELINE IMAGE SEQUENCES

EE795: Computer Vision and Intelligent Systems

Local Image Registration: An Adaptive Filtering Framework

Object Recognition with Invariant Features

Digital Image Processing. Image Enhancement in the Frequency Domain

Transcription:

252C: Camera Stabilization for the Masses Sunny Chow. Department of Computer Science University of California, San Diego skchow@cs.ucsd.edu 1 Introduction As the usage of handheld digital camcorders increase amongst the general population, the problem of undesired motions being introduced into captured video sequences will become ever more prevalent. Even the slightest movements on the part of the user will introduce these undesired motions, and unfortunately users will find it difficult to remain absolutely stable for any amount of time. These undesired motions detract from the intended subject of the video sequence, and may potentially ruin the entire video sequence. Undoubtedly, the user will want these types motions removed while preserving desired motions. Luckily, the camcorder s digital format lends itself very nicely to a post-processing corrective step to remove such undesired motions. We introduce a method for removing such unwanted motions by characterizing desired motions as smooth transitions with gradual starts and stops that occur over the span of some time period, and undesired motions as anything that detract from this ideal. Video sequences can then be corrected by smoothing the motion over some span of time. For smoothing to be performed, some representation of motion is needed and since planar homographies map points from one video frame to another, it essentially describes the transformation/motion between frames. Homographies are calculated for each sequential pair of frames, and by accumulating and manipulating each of these entries, this creates a set of ideal homographies representing only the desired motions. We then force each pair of frames to respect the corresponding ideal homography. The resulting frames are subsequently recombined to create a corrected video sequence containing only desired motions. The problem of defining exactly how gradual the starts and stops, and how smooth the transitions should be to classify a motion as desired remains. Unfortunately, the ideal definitions of these parameters are entirely subjective and dependent on each unique user. We made the decision that any motion lasting less than 4 seconds is unwanted, and any motion that last for more than 4 seconds is desired. However, the method as described will be general enough for other possible definitions of what motions is desired or undesired, so long as the definitions can be represented as a set of ideal homographies.

2 Initial Motion Estimation The first step in our algorithm estimates a homography between each pair of successive frames in the video sequence. A homography H between frames k and k+1 is defined as H i = hi 11 h i 12 h i 13 h i 21 h i 22 h i 23 h i 31 h i 32 h i 33 and given frames f 1, f 2,..., f k,..., f N, N defining the total number of frames in each seqeuence, Forstner features [1] are detected for frames f 1, f 2,..., f N 1. For each set of features in frame f k, a set of point of correspondences is found in frame f k+1 via normalized correlation. Hartley normalization is performed on each set of matching points, and a Direct Linear Transform aided by RANSAC [3] is used to calculate the homography, H k, between each pair of frames, f k, f k+1. This approach in calculating homographies implicitly makes a few assumptions on the captured scene structures. The scene must remain relatively static, as any objects moving indepedently of the camera will throw off the homography estimations. Motion blur will also reduce the accuracy of the point correspondences, which in turn will also reduce the accuracy of the homography estimations. Once the homographies have been estimated, each entry h k 11,... h k 33 for a pair frames f k, f k+1, provides some information on what transformation the camera undertook between the two frames. To examine entries from different homographies together, we must first normalize each matrix to some type of common scaling factor and we can do this by multiplying each matrix by the 1 h k 33. Once the homography matrices have been normalized to some common scaling factor with respect to each other, we can examine the set of homography matrices as a whole by placing each entry from all the homography matrices into a single vector, v 11 = [h 1 11, h 2 11,..., h N 1 11 ] 12 ] v 12 = [h 1 12, h 2 12,..., h N 1. v 32 = [h 1 32, h 2 32,..., h32 N 1 ] thus creating a set of eight vectors of the eight entries that represent the camera motion over the entire video sequence. 3 Motion Correction Once the camera motion has been estimated, a decision must be made to determine exactly what camera motions we want to keep and what camera motions we want to correct for. We take a heuristic approach and assume that any camera motion lasting less than 4 seconds is unwanted motion. To remove these unwanted motions, each of the eight v 11, v 12,..., v 32 are convolved with a Gaussian kernel with a window size equal to 2 times the frames per second rate of the motion sequence. We define the resulting convolved vectors as η 11, η 12,..., η 32. The effect of the Gaussian filter on the set of vectors is similar to what would happen if a Gaussian filter is applied to a 2D image; high frequency noise is removed, while dominant image features are smoothed out over the entire image. The smoothing effect that a Gaussian filter provides is exactly the characteristic that we want our resulting video sequence to possess. After convolution, the set of convolved vectors are then recombined into an ideal set of homography matrices [ ] η11 (k) η 12 (k) η 13 (k) H k = η 21 (k) η 22 (k) η 23 (k) η 31 (k) η 32 (k) η 33 (k)

that we would like each pair of frames, f k, f k+1 to respect. After the ideal set of homography matrices has been determined, the existing frames of the video sequence must be transformed in such a way so that each pair of transformed frames respect the corresponding ideal homography. For this problem, we first start from the very first frame of the movie sequence and observe that the second frame can be transformed to respect the ideal homography matrix. Given an initial set of points, pts k = [(x k 1, y1 k ), (x 1 2, y2 k ), (x k 3, y3 k ), (x k 4, y4 k )], from the first frame, this initial set of points can be mapped both onto the second frame and the second ideal frame. f 1 f 2 : pts 2 = H 1 pts 1 f 1 f 2 : pts 2 ideal = H 1 pts 1 Once these two sets of points have been transformed with respect to the second frame and second frame ideal frame, these transformed points can be used to calculate the homography, H2, between the second frame and second ideal frames themselves. f 2 f 2 : pts 2 ideal = H 2 pts 2 H 2 can then be used to transform the entire second frame so that the ideal homography matrix now defines the relationship between the first frame and the ideal second frame. Calculating H 3 is similar except now, H2 is used. This is more clearly explained in Fig. 1. The diagram shows a set of initial points, pts 2, from Frame 2, f 2, being mapped onto Frame 3, f 3 via the estimated homography matrix, H 2. This same set of initial points then undergoes two homography transformations, one to map the points onto the ideal frame 2 and a third to map those just transformed points onto ideal frame 3. f 2 f 3 : pts 3 = H 2 pts 2 f 2 f 3 : pts 3 ideal = H 2 H2 pts 2 f 2 f 3 : pts 3 ideal = H 2 pts 2 ideal A homography is then calculated for the two set of points as calculated, f 3 f 3 : pts 3 ideal = H 3 pts 3 and a homography now exists to map Frame 3 onto Ideal Frame 3. For all subsequent H k a similar process can be applied. f k 1 f k : pts k = H k 1 pts k 1 f k 1 f k : pts k ideal = H k 1 pts k 1 ideal f k f k : pts k ideal = H k pts k This means that each subsequent frame transformation is dependent on all the frame transformations that have been calculated before that. Intuitively, you would want the current image to be aligned to minimize unwanted camera motion with respect to a previous image that has been aligned as well, but this creates a problem in the implementation. In particular, the initial homography estimations must be precise, otherwise should the estimation of an earlier homography matrix be incorrect, all subsequent homography matrices will also be incorrect. 4 Results The video sequences were taken with a handheld digital camera capturing images at a rate of 15 fps. Since we wanted to remove motions that last less than 4 seconds, a Gaussian

blur of an appropriate size was applied to the translations. The datasets were taken with a subject capturing a video sequence while walking down a hall, or around a particular object. Each video sequence has a significant amount of unwanted motion due to each step the subject takes and from the subject s inability to keep the camera absolutely stable. To better visualize the effects of convolving a vector of entries from the homography matrix, Fig 4. compares the measured values for h k 13 of the homography matrix (black) and the smoothed out values (red). The motions incurred from each step of walking is clearly depicted in the sinusoidal form that each of the two graph takes. The effects of the smoothing can also be seen in Fig. 2 and 3. The figures were created by taking 20 frames from both the corrected and uncorrected video sequences, and taking the average across the frames. These images illustrate the motions that were present before and after motion correction was applied, and the corrected averages display a distinct contrast to the uncorrected averages. Each of the corrected averages convey a camera motion that is much more distinct than the uncorrected averages which instead showed lots of spurious motions. For Fig. 2, pay attention to shelves, as the uncorrected average show more erratic blurring as compared with the corrected average. 5 Conclusions Our camera stabilization routine was able to correct for many types of camera instability. The algorithm relied on very few assumptions except for those inherent in calculating homographies and the one we made regarding unwanted motion can easily be replaced with another equally valid one. However, since the correct homography matrix of one frame depended on all those that preceded it, should one homography matrix be incorrectly estimated, the errors will be propagated to all the others after it. Unfortunately, the serial nature of these calculations cannot be helped, as the homography matrix of one frame is inherently dependent on the one previous to it. This makes our algorithm highly dependent on the accuracy of the initial homography estimations. 6 Future Work In order for our camera stabilization algorithm to be applied to real world applications, object motion must be first be detected and and then separated from scene motion. Currently, Figure 1:

Figure 2: Walking down a hall. Both sequences uncorrected (left) and corrected (right) averaged over 25 frames Figure 3: Moving across a table. Both sequences uncorrected (left) and corrected (right) averaged over 20 frames Figure 4: The measured and smoothed vector containing values of the 7th entry of all the homography matrices

point correspondences are determined via normalized correlation, and should the brightness of the scene suddenly change (a cloud moving over the sun for example), we need to use a point correspondence algorithm that is not dependent on the pixel intensities alone. The homography matrices contain structure that we are completely neglecting by simply blurring the entries of the matrices directly. If we were to instead, recover the actual relative rotations and translations of the camera from the homography matrices, the manipulations would be much more intuitive. 7 References [1] Forstner, W. & E Gulch (1987) A Fast Operator for Detection and Precise Location of Distinct Points, Corners and Centres of Circular Features ISPRS Intercommission Workship, Interlaken7 [2] Harley, R. (1997) In Defense of the Eight-Point Algorithm IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.19 No. 6 [3] Fischler, A. & Bolles, R. (1981) Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography Communications of the ACM, June 1981, Volume 24, Number 6