VIDEO STABILIZATION WITH L1-L2 OPTIMIZATION. Hui Qu, Li Song

VIDEO STABILIZATION WITH L-L2 OPTIMIZATION Hui Qu, Li Song Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University ABSTRACT Digital videos often suffer from undesirable camera jitters because of unstable camera motions. In this paper we present a novel video stabilization algorithm by mixed L-L2 optimization, aiming at removing unwanted camera movements as well as keeping the original video information to the greatest extent. In the proposed algorithm, we compute smoothed camera paths that are composed of constant, linear and parabolic segments by L constraints, meanwhile using the L2 norm of the difference between smoothed and original camera paths to retain the video information. Different from other existing methods, there is only one parameter to control effects of two terms, which is both flexible and easy to meet different requirements in practice. We further design an efficient moving window scheme to support online processing or unlimited length video. Experimental results demonstrate the good performance of our proposed algorithm. Index Terms Video stabilization, online processing, mixed L-L2 optimization. INTRODUCTION As the development of hand-held camera devices, people can obtain videos easily from their cell phones or digital cameras. Compared to film cameras, cell phones are significantly lighter, resulting in low quality videos with jitters. The same problem occurs when the camera is mounted on a vehicle with unstable motion, such as the unmanned aerial vehicle (UAV). Video stabilization is applied to alleviate the handshaking problem in order to improve the visual quality of these videos or be a preprocess of some other procedures, such as object tracking and object detection, to increase precision and robustness. Most video stabilization algorithms consist of three main steps [, 2, 3, 4, 5, 6, 7]: () Original camera path estimation, (2) Smooth camera path computation, and (3) Synthesizing the stabilized video. Different methods differ in the three steps. Video stabilization is achieved by first estimating the o- riginal camera path. One can employ feature tracking and 2D linear motion models to compute the 2D camera path [, 2, 3], or use Structure from Motion (SfM) like Liu et al.[5] to estimate the 3D camera path. The 2D method is computationally efficient while the 3D camera path is more accurate at the expense of computation complexity. Other methods like block matching [7] are also useful in certain situations. Smooth camera path estimation removes high-frequency jitters and computes the global transformation necessary to stabilize the current frame. Grundmann et al.[2] used L- smoothness constraint based on cinematography principles to obtain, which can lead to good stabilization results, but discard much original video information. Liu et al.[3] introduced a technique that imposes subspace constraints on feature trajectories. They factor a feature matrix into a product of a camera matrix and a scene matrix and then smooth the scene matrix. The factorization will be not accurate if there are not enough long trajectories. The final step is synthesizing the stabilized video using the transformations obtained in smooth camera path estimation. Many methods like [2, 3] just keep central parts of the original frames to achieve better visual quality. However, further post-processing, e.g. impainting in [4], can be applied to obtain full frames. Among many 2D stabilization methods, Grundmann s L Camera Path Optimization method [2] proposed in 2 can be treated as state-of-the-art and has been integrated into Google s YouTube Editor. Our proposed optimization is related to L optimization, which minimizes the the first, second and third derivatives of the resulting camera path with some linear constraints. However, our algorithm is more general as we optimize both L norm of smooth path and L2 norm of the difference between smoothed and original camera paths. Actually, Lee et al. had similar motivation [6], and they optimized both feature matches and motion similarity of neighboring features when estimating camera path. However, they used both L2 norm for two terms and involved many empirical parameters when solving the optimization problem, which make it hard to implement and adapt to different video contents. In contrast, our mixed L-L2 optimization have only one adjusting parameter and can be efficiently solved by convex optimization tools developed in recent years. Motivated by the above works [2, 6], we propose a mixed L-L2 optimization for 2D video stabilization, which can not only achieve good stabilized results, but also retain as much as information of original video as possible. By adjust only one parameter, users can control the degree of stabilization and fidelity of original videos as needed. Furthermore, we

design an efficient moving window scheme to support online processing for unlimited length shaky videos. The rest of the paper is organised as follows. We introduce Grundmann s work briefly in section II, and then present a new mixed L-L2 model for video stabilization in section III and some key issues are discussed in section IV. Experiments are shown in section V to demonstrate the performance of our algorithm. And conclusion comes in section VI. 2. PRIOR L OPTIMIZATION FRAMEWORK FOR VIDEO STABILIZATION In [2], Grundmann et al. used feature tracking and 2D linear motion model fitting to compute the. The frames of the video are denoted by I, I 2,, I n, and the motion of features from I t to I t is modeled by a 2D motion model F t, which is similarity or affine. Then the original camera path C t is defined as C t+ = C t F t+ C t = F F 2 F t () With C t, they expressed the desired optimal path P t as P t = C t B t (2) where B t is the update transform that stabilizes the corresponding frame. Grundmann et al. assumed that the optimal path is only composed of three kinds of segments: a constant path representing a static camera, a path of constant velocity representing a panning or a dolly shot and a path of constant acceleration representing ease in and out transition between static and panning cameras. Therefore, the objective function of their L optimization problem is O(P ) = ω D(P ) + ω 2 D 2 (P ) + ω 3 D 3 (P ) (3) where ω, ω 2, ω 3 are empirical weights and D means derivative. The relative values of ω, ω 2, ω 3 are crucial to the s- moothed camera path and should be carefully set. They also added inclusion constraint to preserve the intent of the video. Finally, Grundmann et al. transformed the original frames by B t and retain the content within a crop window, thus the stabilized video has no blank areas but discard some information on the boundary of the original video. Besides, they performed residual motion suppression to reduce rolling shutter effects. Their algorithm is effective for a variety of videos. However, it discards information due to cropping, which may be not suitable for videos with important information near the boundary. What s more, the three parameters in equation 3 are empirically set and hard to be adaptable to different kinds of videos. 3. VIDEO STABILIZATION WITH MIXED L-L2 OPTIMIZATION L optimization has the property of sparsity, making the computed optimal path has derivatives which are exactly zero for most segments; while L2 optimization is to achieve the best estimation in a least square sense, e.g. fit a line by sample points with errors. In order to keep the boundary information of original videos as much as possible while performing video stabilization, we expect that the optimal smooth camera path is close to the original path, which can be realized by the introduction of a L2 term in the objective function: O(P ) = L (P ) + λ P C 2 (4) where L (P ) is the L term similar to that in equation 3 (ω = ω 2 = ω 3 = ): L (P ) = D(P ) + D 2 (P ) + D 3 (P ) (5) and λ is a weight to adjust the smoothness of the path. Similar to [2], the L term in objective function consists of the first, second and third derivatives of optimal path. But unlike [2], we retain C t in our optimization, e.g. for D(P ), it can be decomposed into equation 6. For each frame, C t is different, and it is unreasonable to remove C t for the sum of difference in equations 6. n n D(P ) = P t+ P t = C t+ B t+ C t B t t= t= (6) For the L2 term, it minimizes the difference between o- riginal camera path and : n P C 2 2 = n (P t C t ) 2 = (C t B t C t ) 2 (7) t= t= Compared to the algorithm in [2], we have no different weights in the L term. In fact, the introduction of L2 norm can automatically set the weights of the three kinds of segments according to the shape of. If the original path is nearly constant (with high frequency jitters) in a period of time, then the weight of D(P ) in L term is much greater than that of two others because the optimal path should be close to constant due to L2 norm. And if a segment of original path is with almost constant velocity, then D 2 (P ) dominates the L term. The optimal paths obtained via our algorithm and Grundmann s algorithm are shown in fig.. Note that the optimal path of our algorithm is smooth without setting weights on three kinds of segments. Besides, our optimal path is closer to the original path than Grundmann s path, therefore, we can set the crop window larger to retain more content of original video. There are many off-the-shelf toolbox to solve such mixed L-L2 convex optimization problems as discussed in [8]. Here we use the freely available CVX solver. CVX Research: http://cvxr.com/cvx/.

2 4 6 8 2 4 6 8 3 Motion in x over frames 3 Motion in y over frames original path optimal path via algorithm in [2] optimal path via our algorithm original path optimal path via algorithm in [2] optimal path via our algorithm 4 3 2 2 3 4 Fig.. Optimal camera path obtained by the algorithm in [2] and by our algorithm for the same video. The crop window is 8% size of the original frame. The parameter λ in equation 4 is set to.5. 4. KEY ISSUES AND SUMMARY OF OUR ALGORITHM For a practical video stabilization algorithm, there are several issues need to be addressed to make the proposed algorithm more robust and efficient. 4.. Online Video Stabilization Many post-processing algorithms can handle short clips of video. However, for long videos, the number of variables in the optimization problem may result in low efficiency and large memory consuming. To make the algorithm more efficient, we design an online processing scheme. Intuitively, long videos can be cut into several segments to be stabilized separately. A problem is that the optimal path may have a shift at the beginning of each segment (see fig.) and the whole optimal path may be discontinuous at joint frames. So adjacent segments should have overlaps. Let N denote the length of each segment or the window and K is the number of overlap frames. When the stabilization process begins, we compute the optimal path of first N frames P () t within the window, and only stabilize the first N K frames. Then the window is moved to the next N frames with K overlapped frames with the previous segment, i.e. from I N K+ to I 2N K. The optimal path of N frames P (2) t within the window is also computed. For the first K overlap frames, optimal path is obtained by the weighted average of P () t and P (2) t : P t = υ i P () t + ( υ i ) P (2) t (8) where t = N K +,, N, and υ i, i =, 2,, K are weights and their values are related to the frame number t. In this paper we simply set their values to υ i = i/k, i =, 2 K, with K = 3. Subsequently, the optimal path P t and update transformations B t of first N K frames in the current window are acquired and used to stabilize the frames. As the window moves forward, the same process proceeds until the end of the video. The process is shown in fig.2. Fig. 2. The moving window process. The red brace means the range of the window. 4.2. The choice of parameter λ We do not set different weights for three terms of L in the cost function, so the choice of λ is crucial for the results. When λ is too small, the optimization problem is close to that in equation 3 with ω = ω 2 = ω 3 =, thus the optimal camera path will be not smooth enough on the transition of constant path and path with constant velocity. What s worse, the shift at the beginning of each segment may be large with small λ, so the optimal path of overlap frames computed by equation 8 will be inaccurate (fig.3(a)), causing that the w- hole optimal path seems not smooth as desired and the visual quality of stabilized video is not good, either. If the value of λ is too large, then the optimization concentrates on the L2 part, pulling the optimal path near to the and making the optimal path lack of smoothness (fig.3(d)). 3 3 2 4 6 8 2 4 6 8 (a) λ =. 2 4 6 8 2 4 6 8 (c) λ =. 3 3 2 4 6 8 2 4 6 8 (b) λ =.5 2 4 6 8 2 4 6 8 (d) λ = 2. Fig. 3. Optimal camera path obtained by our algorithm with different values of λ. Actually, λ can be treated as a factor that controls the degree of stabilization. For videos which have no important information on the boundary, λ can be relatively small to obtain perfect visual feeling. While for videos which may have key information on the boundary, such as surveillance videos and UAV videos, λ may be set a little larger. Therefore, we can

Fig. 4. Stabilized frames of sidewalk video. The first row is our result with a 95% crop window size, and the second row is that of Grundmann et.al[2] with a 9% crop window size. Algorithm : Video stabilization for each segment Step : Feature selection and tracking, outlier rejection Step 2: Fit motion model Ft and compute the original camera path Ct in equation () Step 3: Solve mixed L-L2 optimization problem in equation (4) to obtain update tranform Bt Step 4: Reduce rolling shutter effects as [2] Step 5: Stabilize original frames by Bt not only reduce jitters but also preserve most of the information, although the stabilized videos have some low frequency shake. In a word, the users can set the value of λ according to their needs to obtain stabilized videos as they expect. 4.3. Summary of the proposed algorithm The proposed algorithm for each segment is summarized in Algorithm. In step, we track features by pyramidal Lucas-Kanade [9] like Grundmann et al, but we perform global outlier rejection by RANSAC. To improve the accuracy of outlier rejection, we set a minimum distance between features to ensure the distribution of selected features is relatively uniform on the whole frame. Besides, we re-select features for tracking every frames to reduce the accumulated error of tracked features. In step 3, the problem has inclusion and proximity constraints, which are the same as those in [2]. And in step 4, homography is used to replace similarity in some frames to suppress rolling shutter effects due to its higher accuracy on modeling inter-frame motions. However, homography is unstable and the replacement should be carefully controlled. We use the similar method as Grundmann et al. in [2]. 5. EXPERIMENTS To evaluate the performance of the proposed algorithm, we have applied it to stabilize typical shaky videos. We also compare our method to that of Grundmann et al.[2]. sidewalk is a surveillance video obtained by a shaky camera. Some frames of stabilized results by our algorithm and by Grundmann s algorithm are shown in fig.4. λ is set to.5, the lengths of the window and overlap frames are and 3 respectively. Values of weights in equation 3 are ω =, ω2 =, ω3 =. We can retain 95% of the original frames since the optimal path is close to the original path, while by Grundmann s method only 9% contents are preserved. There is a time recorder at the bottom of the video. Obviously, our stabilized frames contain the most part of time information while Grundmann s results lost this information. As a result, if we want to analysis the video after stabilization, e.g. figure out when the three people in the center of first frame walked out of the camera s view, we cannot obtain useful information from the stabilized video by Grundmann s method. Besides, the visual quality of our stabilized video is nearly the same as that of Grundmann et al. More results and comparison are available at the website http://www. youku.com/playlist_show/id_889274.html. 6. CONCLUSION We have proposed a novel approach for video stabilization. By introduction of mixed L-L2 optimization, we can obtain stabilized videos as well as preserve as much information as possible. We further design an efficient moving window scheme to support processing online or unlimited length video. In contrast to the algorithm of Grundmann et al.[2], our method can be more useful on videos with important information on the boundary. 7. ACKNOWLEDGEMENT This work was supported by National 863 project(22aa 73), NSFC (622, 6936), the Project (B7 22) and the Shanghai Key Laboratory of Digital Media Processing and Transmissions.

8. REFERENCES [] S. Battiato, G. Gallo, G. Puglisi, and S. Scellato, Sift features tracking for video stabilization, in Proc. of International Conference on Image Analysis and Processing (ICIAP), 7, pp. 825 83. [2] Grundmann M., Kwatra V, and Essa I, Auto-directed video stabilization with robust l s, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2, pp. 225 232. [3] F. Liu, M. Gleicher, J. Wang, H. Jin, and A. Agarwala, Subspace video stabilization, In ACM Transactions on Graphics, vol. 3, 2. [4] Matsushita Y., Ofek E., Ge W., and etc, Full-frame video stabilization with motion inpainting, IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 63, 6. [5] F. Liu, M. Gleicher, H. Jin, and A. Agarwala, Contentpreserving warps for 3d video stabilization, In ACM SIGGRAPH, 9. [6] K. Y. Lee, Y. Y. Chuang, B. Y. Chen, and M. Ouhyoung, Video stabilization using robust feature trajectories, in Proc. IEEE Int. Conf. Computer Vision, 9, pp. 397 44. [7] S. Battiato, A. R. Bruna, and G. Puglisi, A robust video stabilization system by adaptive motion vectors filtering, in Proc. Int. Conf. Multimedia and Expo (ICME), 8, pp. 373 376. [8] M. Zibulevsky and M. Elad, L-l2 optimization in signal and image processing, IEEE Sig. Proc. Mag., vol. 27, no. 3, pp. 76 88, 2. [9] J. Shi and C. Tomasi, Good features to track, In IEEE CVPR, 994.