A Duality Based Approach for Realtime TV-L 1 Optical Flow

A Duality Base Approach for Realtime TV-L 1 Optical Flow C. Zach 1, T. Pock 2, an H. Bischof 2 1 VRVis Research Center 2 Institute for Computer Graphics an Vision, TU Graz Abstract. Variational methos are among the most successful approaches to calculate the optical flow between two image frames. A particularly appealing formulation is base on total variation (TV) regularization an the robust L 1 norm in the ata fielity term. This formulation can preserve iscontinuities in the flow fiel an offers an increase robustness against illumination changes, occlusions an noise. In this work we present a novel approach to solve the TV-L 1 formulation. Our metho results in a very efficient numerical scheme, which is base on a ual formulation of the TV energy an employs an efficient point-wise thresholing step. Aitionally, our approach can be accelerate by moern graphics processing units. We emonstrate the real-time performance (30 fps) of our approach for vieo inputs at a resolution of 320 240 pixels. 1 Introuction The recovery of motion from images is a major task of biological an artificial vision systems. The main objective of optical flow methos is to compute a flow fiel estimating the motion of pixels in two consecutive image frames. Since optical flow is an highly ill-pose inverse problem, using pure intensity-base constraints generally results in an uner-etermine system of equations, which is generally known as the aperture problem. In orer to solve this problem some kin of regularization is neee to obtain physically meaningful isplacement fiels. In their seminal work [13], Horn an Schunck stuie a variational formulation of the optical flow problem. { } min u 1 2 + u 2 2 + λ (I 1 (x + u(x)) I 0 (x)) 2. (1) u I 0 an I 1 is the image pair, u = (u 1 (x), u 2 (x)) T is the two-imensional isplacement fiel an λ is a free parameter. The first term (regularization term) penalizes high variations in u to obtain smooth isplacement fiels. The secon term (ata term) is also known as the optical flow constraint. It assumes, that the intensity values of I 0 (x) o not change uring its motion to I 1 (x + u(x)).

Since the Horn-Schunck moel penalizes eviations in a quaratic way, it has two major limitations. It oes not allow for iscontinuities in the isplacement fiel, an it oes not hanle outliers in the ata term robustly. To overcome these limitations, several moels incluing robust error norms an higher orer ata terms have been propose. Since iscontinuities in the optical flow appear often in conjunction with high image graients, several authors replace the homogeneous regularization in the Horn-Schunck moel with an anisotropic iffusion approach [15, 19]. Other propose moifications substitute the square penalty functions in the Horn-Schunck moel with more robust variants. Blake an Ananan [5] apply estimators from robust statistics an obtain a robust an iscontinuity preserving formulation for the optical flow energy. Aubert et al. [3] analyze energy functionals for optical flow incorporating an L 1 ata fielity term an a general class of iscontinuity preserving regularization forces. Papenberg et al. [16] employ a ifferentiable approximation of the TV (resp. L 1 ) norm an formulate a neste iteration scheme to compute the isplacement fiel. Most approaches for optical flow computation replace the nonlinear intensity profile I 1 (x + u(x)) by a first orer Taylor approximation to linearize the problem locally. Since such approximation is only vali for small isplacements, aitional techniques are require to etermine the optical flow correctly for large isplacements. Scale-space approaches [1] an coarse-to-fine warping (e.g. [2, 14, 7]) provie solutions to optical flow estimation with large isplacements. In several applications, such as autonomous robot navigation, it is necessary to calculate isplacement fiels in real-time. Real-time optical flow techniques typically consier only the ata fielity term to generate isplacement fiels [10, 18]. One of the first variational approaches to compute the optical flow in realtime was presente by Bruhn et al. [8, 9]. In their work a highly efficient multigri approach is employe to obtain real-time or near real-time performance. The aim of their approach is very similar to our objective: obtaining robust an iscontinuity preserving solutions for optical flow with highly efficient implementations. Nevertheless, we utilize a completely ifferent solution strategy as escribe in the next sections. 2 TV-L 1 Optical Flow In the basic setting two image frames I 0 an I 1 : ( R 2 ) R are given. The objective is to fin the isparity map u : R 2, which minimizes an imagebase error criterion together with a regularization force. In this work we focus on the plain intensity ifference between pixels as the image similarity score. Hence, the target isparity map u is the minimizer of { } λφ (I 0 (x) I 1 (x + u(x))) + ψ(u, u,...) x, (2) where φ (I 0 (x) I 1 (x + u(x))) is the image ata fielity, an ψ(u, u,...) epicts the regularization term inucing the shape prior. λ weights between the

ata fielity an the regularization force. Selecting φ(x) = x 2 an ψ( u) = u 2 results in the Horn-Schunck moel [13]. The choice of φ(x) = x an ψ( u)) = u yiels to the following functional consisting of an L 1 ata penalty term an total variation regularization: E = { } λ I 0 (x) I 1 (x + u(x)) + u x. (3) Although Eq. 3 seems to be simple, it offers some computational ifficulties. The main reason is that both the regularization term an the ata term are not continuously ifferentiable. One approach is to replace φ(x) = x an ψ( u) with ifferentiable approximations φ ε (x 2 ) = x 2 + ε 2 an ψ ε ( u) = u 2 + ε 2, an to apply a numerical optimization technique on this slightly moifie functional (e.g. [12, 7]). In this paper we employ a rather ifferent approach. In [11] Chambolle propose an efficient an exact numerical scheme to solve the Ruin-Osher-Fatemi energy [17] for total variation base image enoising. In the following, we will escribe how to aopt this approach for the optical flow case. 2.1 The 1D Stereo Case In this section we restrict the isparities to be non-zero only in the horizontal irection, e.g. a normalize stereo image pair is provie. Hence, u(x) reuces to a scalar u(x), an we use the (sloppy) notation x + u(x) for x + (u(x), 0) T. The following erivation is base on [4], but aapte to the stereo/optical flow setting. At first, we linearize image I 1 near x + u 0, i.e. I 1 (x + u) = I 1 (x + u 0 ) + (u u 0 ) I x 1 (x + u 0 ), where u 0 is a given isparity map an I1 x is the erivative of the image intensity I 1 wrt. the x-irection. Using the first orer Taylor approximation for I 1 means, that the following proceure nees to be embee into an iterative warping approach to compensate for image nonlinearities. Aitionally, a multi-level approach is employe to allow large isparities between the images. For fixe u 0 an using the linear approximation for I 1, the TV-L 1 functional (Eq. 3) now reas as: E = { } λ u I1 x + I 1 (x + u 0 ) u 0 I1 x I 0 + u x. (4) In the following, we enote the current resiual I 1 (x + u 0 ) + (u u 0 ) I1 x I 0 by ρ(u, u 0, x) (or just ρ(u) by omitting the explicit epenency on u 0 an x). Moreover, we introuce an auxiliary variable v an propose to minimize the following convex approximation of Eq. 4: { E θ = u + 1 } 2θ (u v)2 + λ ρ(v) x, (5)

where θ is a small constant, such that v is a close approximation of u. This convex minimization problem can be optimize by alternating steps upating either u or v in every iteration: 1. For v being fixe, solve min u { u + 12θ } (u v)2 x. (6) This is the total variation base image enoising moel of Ruin, Osher an Fatemi [17]. 2. For u being fixe, solve min v { } 1 2θ (u v)2 + λ ρ(v) x. (7) This minimization problem can be solve point-wise, since it oes not epen on spatial erivatives of v. An efficient solution for the first step (Eq. 6) was propose in [11], which uses a ual formulation of Eq. 6 to erive an efficient an globally convergent scheme. Since this algorithm is an essential part of our metho, we reprouce the relevant results from [11]: Proposition 1 The solution of Eq. (6) is given by where p = (p 1, p 2 ) fulfills u = v θ iv p, (8) (θ iv p v) = (θ iv p v) p, (9) which can be solve by the following iterative fixe-point scheme: where p 0 = 0 an the time step τ 1/8. p k+1 = p k + τ (iv p k v/θ) 1 + τ (iv p k v/θ), (10) The next proposition characterizes the minimizer of the secon part (Eq. 7): Proposition 2 The solution of the minimization task in Eq. 7 is given by the following thresholing step: λ θ I1 x if ρ(u) < λ θ (I1 x ) 2 v = u + λ θ I1 x if ρ(u) > λ θ (I1 x ) 2 (11) ρ(u)/i1 x if ρ(u) λ θ (I1 x ) 2. This means, that the image resiual ρ(v) is allowe to vanish, if the require step from u to v is sufficiently small. Otherwise, v makes a boune step from u, such that the magnitue of the resiual ecreases. The proposition above can be shown irectly by analyzing the three possible cases, ρ(v) > 0 (inucing v = u λ θ I x 1 ), ρ(v) < 0 (v = u + λ θ I x 1 ) an ρ(v) = 0 (v = u ρ(u)/i x 1 ).

2.2 Generalization to Higher Dimensions In this section we exten the metho introuce in the previous section to optical flow estimation, i.e. an N-imensional isplacement map u is etermine from two given N-D images I 0 an I 1. The first orer image resiual ρ(u, u 0, x) wrt. a given isparity map u 0 is now I 1 (x+u 0 )+ I 1, u u 0 I 0 (x). Aitionally, we write u for the -th component of u ( {1,..., N}). The generalization of Eq. 5 to more imensions is the following energy: E θ = { u + 1 2θ (u v ) 2 + λ ρ(v) } x. (12) Similar to the stereo setting, minimizing this energy can be performe by alternating optimization steps: 1. For every an fixe v, solve min { u + u 12θ } (u v ) 2 x. (13) This minimization problem is ientical to Eq. 6 an can be solve by the same proceure. Note, that the ual variables are introuce for every imension, e.g. Eq. 8 now reas as u = v θ iv p. (14) 2. For u being fixe, solve min v 1 2θ (u v ) 2 + λ ρ(v). (15) The following proposition generalizes the thresholing step from Proposition 2 to higher imensions: Proposition 3 The solution of the minimization task in Eq. 15 is given by the following thresholing step: λ θ I 1 if ρ(u) < λ θ I 1 2 v = u + λ θ I 1 if ρ(u) > λ θ I 1 2 (16) ρ(u) I 1 / I 1 2 if ρ(u) λ θ I 1 2. This proposition essentially states, that the N-imensional optimization problem can be reuce to a one-imensional thresholing step, since v always lies on the line l going through u with irection I 1 (for every x). This can be seen as follows: The first part in Eq. 15, (u v ) 2 /2θ, is basically the square istance of v to u, an the secon part, λ ρ(v), is the unsigne istance to the line l : ρ(w) = 0, i.e. I 1 (x + u 0 ) + I 1, w u 0 I 0 (x) = 0. If we consier all v µ with a fixe istance µ to u, then the functional in Eq. 15 is minimize for the v µ closest to the line l (with minimal normal istance). This is also vali for the true minimizer, hence the optimum for Eq. 15 is on l. In aition, the one-imensional thresholing step in graient irection can be applie (Proposition 2), resulting in the presente scheme.

3 Implementation This section gives etails on the employe numerical proceure an on the GPUaccelerate implementation for the propose TV-L 1 optical flow approach. Although the iscussion in Section 2.2 is vali for any image imension N 2, our GPU-base implementation is specifically tailore for the case N = 2. 3.1 Numerical Scheme The generally non-convex energy functional for optical flow (Eq. 3) becomes a convex minimization problem after linearization of the image intensities (Eq. 4), but this linearization is only vali for small isplacements. Hence, the energy minimization proceure is embee into a coarse-to-fine approach to avoi convergence to unfavorable local minima. We employ image pyramis with a ownsampling factor of 2 for this purpose. Beginning with the coarsest level, we solve Eq. 3 at each level of the pyrami an propagate the solution to the next finer level. This solution is further use to compute the coefficients of the linear resiual function ρ by sampling I 0 an I 1 using the corresponing pyrami levels. Hence, the warping step for I 1 takes place only once per level. I 1 is approximate by central ifferences. At the beginning of a new level, v is initialize with u, an all p are set to 0. At the coarsest level, the isplacement fiel u starts with 0. Avoiing poor local minima is not the only avantage of the coarse-to-fine approach. It turns out, that the filling-in process inuce by the regularization occurring in textureless region is substantially accelerate by a hierarchical scheme as well. The minimization proceure alternates one step of the fixe-point scheme to upate all p (an therefore u, Eq. 10) with the thresholing step from Proposition 3 to improve v. The implementation of the fixe-point upate (Eq. 10) uses backwar ifferences to approximate iv p an forwar ifferences for the numerical graient computation in orer to have mutually ajoint operators [11]. 3.2 Acceleration by Graphics Processing Units Numerical methos working on regular gris, e.g. rectangular image omains, can be effectively accelerate by moern graphics processing units (GPUs). We employ the huge computational power an the parallel processing capabilities of GPUs to obtain a fully accelerate implementation of our optical flow approach. The GPU-base proceure is essentially a straightforwar Cg implementation of the numerical schemes (Eqs. 10 an 16) with few moifications escribe as follows.

If we write own the alternating minimization steps explicitly, iteration k performs the following upates on u, v an p : 1a. v k+1 T H(u k ) 1b. u k+1 v k+1 θ iv p k for {1, 2} 2. p k+1 pk + τ/θ uk+1 1 + τ/θ u k+1 for {1, 2}, (17) where T H( ) enotes the thresholing step from Eq. 16. These steps can be immeiately implemente on the GPU by appropriate fragment programs using two renering passes. 3 The first pass implements steps 1a an 1b from Eq. 17. The values v k+1 are use only temporarily within the shaer program an nee not to be save explicitly. u k+1 is written to the target texture. The secon shaer program correspons with step 2 from Eq. 17. It turns out, that the utilization of the fragment processors can be improve by upating u an p for two pixels simultaneously. The shaer programs work on the left an on the right half of the images in parallel, with appropriate hanling of borer pixels. Our implementation encoes the two components of u using full 32-bit precision, an the overall four components of p 1 an p 2 are compresse to 16-bit half precision floating point numbers. We currently use a fixe but tunable number of iterations on each level in our implementation, since etermining the maximum upate u k+1 u k still requires an expensive reuction operation even on moern GPUs. 4 Results At first, we provie timing results for our optical flow approach epicte in Table 1. Two harware setups were use to obtain the timing results: a esktop PC equippe with a NViia GeForce 7800 GS car, an a high-en laptop supplie with a NViia GeForce Go 7900 GTX graphics boar. The timing results were obtaine uner the Linux operating system with recent OpenGL graphics rivers an the Cg 1.5 toolkit. The timings in Table 1 are given in frames per secon for the epicte fixe number of iterations on each level of the image pyrami. The measure times inclue the texture uploas to vieo memory an the final visualization of the obtaine isplacement fiel. The timing results inicate, that real-time performance with more than 30 frames per secon can be achieve at 256 256 pixels resolution with our approach. Frames from a live vieo emo application are shown in Figure 1, which continuously reas images from a firewire camera an visualizes the optical flow for ajacent frames. Real-time performance can be achieve with the mobile harware setup. Figure 2 shows common test sequences for optical flow, in particular the Ettlinger Tor, the Rheinhafen an the Yosemite sequences, an their respective flow 3 A single pass variant using only one shaer program is possible as well, but the observe performance is inferior in almost all cases.

GeForce 7800 GS GeForce Go 7900 GTX Image resolution 50 It. 100 200 50 It. 100 200 128 128 56 32.1 17.5 95 57.6 30.9 256 256 18 9.6 5 34.1 17.5 8.9 512 512 5 2.6 1.3 9.3 4.7 2.3 Table 1. Observe frame rates at ifferent image resolutions an with varying number of iterations on our teste harware. (a) First frame (b) Secon frame (c) Optical flow fiel Fig. 1. Capture frames an generate optical flow fiel using our live vieo application. The image resolution is 320 240, an 50 iterations are performe on each level of the image pyrami. The framerate is close to 30 frames per secon in this setting. The flow fiel is visualize using hue to inicate the irection an intensity for the magnitue. fiels. The results for these atasets inicate, that the reuce 16-bit resolution for the ual variables p oes not severely affect the quality of the obtaine flow fiels. Table 2 specifies the obtaine average angular error (AAE) of the flow fiel for the Yosemite ataset wrt. the provie groun truth. If the completely homogeneous sky region is exclue from the AAE calculation, the flow fiel is essentially converge after 50 iterations. Enabling more iterations yiels to slightly inferior results, since the TV-L 1 energy favors piecewise constant flow fiels in the limit. If the sky region is inclue in the evaluation, the AAE error ecreases by increasing the number of iterations. In this case the flow fiel in the sky region converges relatively slowly to the zero isplacement. 50 It. 150 250 Without sky 2.85 2.88 2.89 With sky 5.06 3.7 3.27 Table 2. Average angular error for frame 8 an 9 of the Yosemite sequence at ifferent number of iterations.

(a) Frame 5 (b) Frame 6 (c) Flow fiel, 9.3 fps () Frame 1130 (e) Frame 1131 (f) Flow fiel, 29.1 fps (g) Frame 8 (h) Frame 9 (i) Flow fiel, 27.2 fps Fig. 2. Sample images an obtaine flow fiels for the Ettlinger Tor (512 512 pixels), the Rheinhafen (320 240 pixels) an the Yosemite (320 256 pixels) sequences. 5 Conclusion We presente a novel approach for efficient optical flow estimation using a TV- L 1 energy functional. We evelope a novel fast numerical scheme which can be efficiently implemente on moern graphics processing units. With this we can show real-time performance using online vieo streams. The correctness an quality of our implementation is emonstrate on several atasets. Future work inclues the extension of our approach to hanle color images as well. Aitionally, other image similarity measures, e.g. base on intensity graients, nee to be further explore. The ege preserving nature of total variation can be enhance, if a suitable weighte TV-norm/active contour moel is applie [6]. Future work will aress the incorporation of these feature for stereo an optical flow estimation. Finally, switching from an OpenGL-base implementation to the newer CUDA GPU programming framework is expecte to increase the observe performance substantially.

References 1. L. Alvarez, J. Weickert, an J. Sánchez. A scale-space approach to nonlocal optical flow calculations. In Proceeings of the Secon International Conference on Scale- Space Theories in Computer Vision, pages 235 246, 1999. 2. P. Ananan. A computational framework an an algorithm for the measurement of visual motion. Int. J. Comput. Vision, 2:283 310, 1989. 3. G. Aubert, R. Deriche, an P. Kornprobst. Computing optical flow via variational techniques. SIAM J. Appl. Math., 60(1):156 182, 1999. 4. J.-F. Aujol, G. Gilboa, T. Chan, an S. Osher. Structure-texture image ecomposition moeling, algorithms, an parameter selection. Int. J. Comput. Vision, 67(1):111 136, 2006. 5. M.J. Black an P. Ananan. A framework for the robust estimation of optical flow. In ICCV93, pages 231 236, 1993. 6. X. Bresson, S. Eseoglu, P. Vanergheynst, J. Thiran, an S. Osher. Fast Global Minimization of the Active Contour/Snake Moel. Journal of Mathematical Imaging an Vision, 2007. 7. T. Brox, A. Bruhn, N. Papenberg, an J. Weickert. High accuracy optical flow estimation base on a theory for warping. In European Conference on Computer Vision (ECCV), pages 25 36, 2004. 8. A. Bruhn, J. Weickert, C. Feern, T. Kohlberger, an C. Schnörr. Variational optical flow computation in real time. IEEE Transactions on Image Processing, 14(5):608 615, 2005. 9. A. Bruhn, J. Weickert, T. Kohlberger, an C. Schnörr. A multigri platform for real-time motion computation with iscontinuity-preserving variational methos. Int. J. Comput. Vision, 70(3):257 277, 2006. 10. T. A. Camus. Real-time quantize optical flow. Journal of Real-Time Imaging, 3:71 86, 1997. Special Issue on Real-Time Motion Analyis. 11. A. Chambolle. An algorithm for total variation minimization an applications. Journal of Mathematical Imaging an Vision, 20(1 2):89 97, 2004. 12. T. F. Chan, G. H. Golub, an P. Mulet. A nonlinear primal-ual metho for total variation-base image restoration. In ICAOS 96 (Paris, 1996),, volume 219, pages 241 252, 1996. 13. B.K.P. Horn an B.G. Schunck. Determining optical flow. Artificial Intelligence, 17:185 203, 1981. 14. E. Mémin an P. Pérez. Hierarchical estimation an segmentation of ense motion fiels. Int. J. Comput. Vision, 46(2):129 155, 2002. 15. H.-H. Nagel an W. Enkelmann. An investigation of smoothness constraints for the estimation of isplacement vector fiels from image sequences. IEEE Transactions on Pattern Analysis an Machine Intelligence (PAMI), 8:565 593, 1986. 16. N. Papenberg, A. Bruhn, T. Brox, S. Dias, an J. Weickert. Highly accurate optic flow computation with theoretically justifie warping. Int l J. Computer Vision, pages 141 158, 2006. 17. L. I. Ruin, S. Osher, an E. Fatemi. Nonlinear total variation base noise removal algorithms. Physica D, 60:259 268, 1992. 18. R. Strzoka an C. Garbe. Real-time motion estimation an visualization on graphics cars. In IEEE Visualization 2004, pages 545 552, 2004. 19. J. Weickert an T. Brox. Diffusion an regularization of vector- an matrix-value images. Inverse Problems, Image Analysis an Meical Imaging. Contemporary Mathematics, 313:251 268, 2002.