Topics in over-parametrization variational methods Shachar Shem-Tov

Size: px

Start display at page:

Download "Topics in over-parametrization variational methods Shachar Shem-Tov"

Geraldine Cooper
5 years ago
Views:

1 Topics in over-parametrization variational methods Shachar Shem-Tov

3 Topics in over-parametrization variational methods Research Thesis Submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science Shachar Shem-Tov Submitted to the Senate of the Technion Israel Institute of Technology Tamuz 577 Haifa July 2

5 The research thesis was done under the supervision of Prof. Alfred M. Bruckstein in the Computer Science Department. Acknowledgments: First of all I would like to thank my advisor,prof. Alfred M. Bruckstein, whose passion to the subject of over parametrization method had kept the research fire going on even through hard times. I would also like to thank Dr. Gilad Adiv who kept his door open for me at all times and offered both academic and personal views, ideas and advices that were very insightful. I would like to thank Guy Rosman for the fruitful collaboration including the long and interesting discussions, and for giving me access to his vast knowledge. I m also grateful for all the other people I have worked with and talked to, among whom are Dr. Tal Nir, Prof. Ron Kimmel, Yana Katz and many others. Last but not least, I would also like to thank my family, especially my wife Shirly who was there for me through both hardship and success and offered her unconditional support without which I would not be able to complete this research thesis. Thanks you. Shachar The generous financial support of the Technion and the Israel Science foundation (ISF) (grant no. 55/9.) is gratefully acknowledged.

7 Contents Abstract Abbreviations and Notations 2 Introduction 3 2 Over-parameterized Optical Flow using a Stereoscopic Constraint 6 2. Introduction Background The Variational Framework Epipolar Geometry Estimation of the Fundamental Matrix A Flow Model Based on Local Homographies Euler-Lagrange Equations Implementation Experimental results Conclusions On Globally Optimal Local Modeling: From Moving Least Squares To Over-Parametrization 2 3. Introduction The local modeling of data Global priors on local model parameter variations The over-parameterized functional The over-parameterized functional weaknesses The non-local over-parameterized functional i

8 3.5. The modified data term: a non-local functional implementing MLS The modified regularization term Effects of the proposed functional modifications Euler-Lagrange equations Implementation Weighting scheme Initialization Experiments and results D experiments D Results D example Conclusion Conclusion 47 Remarks Abstract in Hebrew א ii

9 List of Figures 2. Epipolar geometry Optical flow results on the middlebury test set Grove2 and Urban2 sequence results Piecewise linear signal with one discontinuity Snapshots of modified over-parameterized functional minimization (α =.5) Snapshots of modified over-parameterized functional minimization (α = ) Snapshots of non-local over-parameterized functional minimization Weighting function Piecewise Linear test signals Nonlinear test signals Parameters reconstruction Signal noise removal and parameters reconstruction comparison for the One jump signal Signal noise removal and parameters reconstruction comparison for the One discontinuity signal Signal noise removal and parameters reconstruction comparison for the Many jumps & discontinuities signal Signal noise removal and parameters reconstruction comparison for the highly sampled One discontinuity signal Signal noise removal and parameters reconstruction comparison for the highly sampled Many jumps & discontinuities signal Comparison of noise removal with various algorithms Reconstructed parameter comparison Comparison of noise removal with various algorithms iii

10 3.7 Signal noise removal comparison of the nonlinear signals Comparison of noise removal on the polynomial signal C continuous 2nd degree polynomial signal C continuous 2nd degree polynomial signal reconstruction C continuous 2nd degree polynomial parameters reconstruction C continuous 2nd degree polynomial signal performance graphs D noise removal example D parameters reconstruction iv

11 List of Tables 2. AAE comparison for static scenes of the Middlebury training set and for the Yosemite sequence v

12 vi

13 Abstract This thesis discusses a variational methodology which involves local modeling of data from noisy samples, combined with global model parameter regularization. This methodology enables the reconstruction of local parameterized model describing the signal while addressing it s main goal. We show that this methodology encompasses many previously proposed algorithms, from the celebrated moving least squares methods to the globally optimal over-parametrization methods. However, the unified look at the range of problems and methods previously considered also suggests a wealth of novel global functionals and local modeling possibilities. Over-parametrization is a general variational methodology, which enables prior knowledge about the properties of the problem domain to be readily incorporated by means of a set of basis or dictionary functions. The novelty of this methodology is in the fact that the regularization term is designed to penalize for deviation from the model which describes the domain, whereas common functionals penalize for deviation in the signal itself. This methodology yields excellent results for noise removing and for optical flow estimation. Specifically, we discuss two application of the proposed variational methodology. We begin by showing that the over-parameterized functional regularization may be expanded into an implicit segmentation process. Using a motion model which imposes a stereoscopic constraint, we show that this functional generates state of the art optical flow results. We then propose to incorporate a new non-local variational functional into the methodology, and show that it greatly improves robustness and accuracy in local model recovery compared to previous methods. The proposed methodology may be viewed as a basis for a general framework for addressing a variety of common research domains in signal and image processing and analysis, such as denoising, adaptive smoothing, reconstruction and segmentation.

14 Abbreviations and Notations E D Data or fidelity term E S Smoothness or regularization term E S,AT Regularization term modified with Ambrosio-Tortorelli scheme u,v Optical flow in x,y direction respectivly ψ(s 2 ) convex approximation of the L norm M Generic model of the optical flow A i Model parameters φ i Model basis functions α Relative weight constant between the data term and the regularization term v AT Ambrosio-Tortorelli diffusivity function ε Small constant value ρ,ρ 2 Ambrosio-Tortorelli scheme constants F Fundamental matrix e,e left, right epipole π A plane H planar homography f noisy Noisy signal w(x, y) Weighting function AT Ambrosio-Tortorelli scheme TV Total variation OP Over-parameterized method MOP Modified over-parameterized method (with AT scheme ) NLOP Non-local over-parameterized method K SV D Denoising algorithm ST D Standard deviation C Continuously differentiable function 2

15 Chapter Introduction A fundamental problem in both image and signal processing is that of recovering a function, a curve or a surface (i.e., a signal or an image) from its noisy and distorted samples. Significant research effort addressed this problem and the results obtained so far are quite remarkable. The most important ingredient in the success of any method that extracts signals from noise is, of course, the set of assumptions that summarizes our prior knowledge about the properties of the signal that effectively differentiate it from the noise. These assumptions range from some vague general requirements of smoothness on the signals, to quite detailed information on the structure or functional form of the signals that might be available due to prior knowledge on their sources. The prior information on signal/image is often expressed in the form of a parameterized model. For instance, in speech recognition [5] slowly varying coefficients of the short-time Fourier transform (STFT) is used to locally describe and model highly fluctuating spectral characteristics over time. In object recognition the choice of the correct spatial support for objects i.e., the segmentation, is a fundamental issue [4], hence in general scene understanding support maps are used to represent segmentation of images into homogeneous chunks, enabling the representation of objects as disjoint regions with different local modeling parameters [22]. In geometric modeling, B-splines (which are essentially a continuous set of piecewise polynomials), are used for local curve and surface approximation, interpolation and fitting from noisy samples [2]. In model-based texture segmentation [27], the selection of an appropriate model support set is important to obtain a good texture representation, which is the basis for the segmentation process. One of the widespread and successful methods for local signal modeling, is the 3

16 celebrated moving least squares local fitting method (MLS)[37]. This methods fits a parameterized model for each signal sample by analytically solving a local set of linear equations. On the other hand a variational method allows for applying global assumptions expressing prior knowledge on the variations of local parameters. A model based global variational methodology, the over-parameterized approach, was originally proposed by Nir et al. [46], where they applied the overparameterized method for signal denoising. They showed that the proposed methodology is able to better express the denoised signal domain, compared with the classical total variation denoising model originally proposed by Rudin et al. [53]. It does so by regularizing the variations of model parameters instead of regularizing the noisy signal itself. They showed that the over-parameterized denoising model produced superior denoising results compared with total variation denoising. Nir et al. [47] then employed the over-parameterized method for optical flow calculation, producing state of the art results. By assuming a given motion model, such as an affine motion model, they represented each optical flow vector by different values of the model parameters. The over-parameterized methodology was further analyzed and generalized by Bruckstein [4], by the realization that over-parametrization methods naturally follow from combining the classical moving least squares methods with global priors on parameter variations. The importance of the over-parameterized method lays in the fact that it allows for directly incorporating knowledge about the problem domain, hence it can be easily extended to address numerous problem areas, in image and signal processing, such as denoising and optical flow. Moreover, due to the structure of the functional it is able, unlike many common methods, to accurately recover the underlying model of the signal. The focus of the successful work presented by Nir et al. was to generate excellent results for the addressed problem i.e., the best denoising or optical flow field one can obtain using the over-parameterized functional. The main focus of this work is the problem of parameters reconstruction, by the understanding that the model parameters express the nature of the underlying signal i.e., the more accurate the parameter reconstruction we may achieve, the better results we get for the problem domain we address, whatever it may be. In order to indeed achieve a better parameters reconstruction, we supplemented the over-parameterized regularization term with an Ambrosio-Tortorelli scheme [3, 55]. This scheme allows the parameters regularization process to become an implicit segmentation process of the parameters field, generating sharper and more accurate parameters disconti- 4

17 nuities, then could be possible to obtain by the original over-parameterized regularization term. Using this modified over-parameterized functional, we returned to the problem of calculating optical flow, specifically for stereoscopic image pairs. We defined a new disparity field arising from epipolar geometry. This model required us to generalize the over-parameterized functional by a non-linear formulation. Using this flow model and the Ambrosio-Tortorelli scheme, the regularization process becomes an implicit segmentation process of the visible surface in the scene into planar patches i.e., the regularization process becomes a segmentation process with a physical meaning. This over-parameterized functional generated the best optical flow results on several stereoscopic image pairs form the Middlebury optical flow test set [8]. The complete description of our new methodology is presented in Chapter 2. Further experimentation we conducted with over-parameterized functional with or without the Ambrosio-Tortorelli scheme modification, taught us that while this functional indeed generates excellent denoising/optical flow results and better parameters reconstruction results (when applied with the Ambrosio-Tortorelli scheme), it still suffers from several weaknesses. We learned that the functional does not convey the piecewise constant global prior on the parameters obtained in the minimization process, as demonstrated in Chapter 3. Trying to cope with these weaknesses, we returned to the D noise removal over-parameterized functional [46]. We modified the functional by generalizing the data term into a non-local functional and preserved the Ambrosio-Tortorelli regularization term. The resulting functional encompasses the weighted least squares fit method, broadening it by applying global regularization on the parameters variations. In Chapter 3, we demonstrate that this modification indeed overcomes the over-parameterized functional weaknesses, and significantly improves it s parameters reconstruction capability, thus improving the overall noise removal performance. 5

18 Chapter 2 Over-parameterized Optical Flow using a Stereoscopic Constraint 2. Introduction Optical flow is defined as the motion field between consecutive frames in a video sequence. Its computation often relies on the brightness constancy assumption, which states that pixel brightness corresponding to a given scene point is constant throughout the sequence. Optical flow computation is a notoriously ill-posed problem. Hence, additional assumptions on the motion are made in order to regularize the problem. Early methods assumed spatial smoothness of the optical flow [3, 39]. Parametric motion models [2, 44], and more recently machine learning [5] were introduced in order to take into account the specificity of naturally occurring video sequences. In parallel, the regularization process was made much more robust [5, 7, 8, 2]. In this chapter, we focus on optical flow computation in stereoscopic image pairs, given a reliable estimation of the fundamental matrix. This problem has already been addressed in [3, 56, 6, 6]. The papers [3, 56] expressed the optical flow as a one-dimensional problem. This was done either by working on a rectified image pair [3], or by solving for the displacement along the epipolar lines [56]. A different approach [6, 6] merely penalized deviation from the epipolar constraint. In addition, [6] proposed a joint estimation of the stereoscopic optical flow and the fundamental matrix. Finally, in order to treat the problem of occluded areas and object boundaries, Ben-Ari and Sochen [] suggest to explicitly account 6

19 for regions of discontinuities. Yet, a third body of works turned to a complete modeling of the scene flow [49, 3, 9]. While this approach is the most general, we focus in this chapter on static scenes, for which a more specific parametrization can be found. While the reported experimental results in the aforementioned papers are very convincing, their regularization methods still rely on the traditional assumption that optical flow should be piecewise smooth. Here, motivated by the over- parametrization approach presented in [47], the optical flow is obtained by estimation of the space-time dependent parameters of a motion model, the regularization being applied to the model parameters. In [4], they used homogeneous coordinates to express a homography model, which allows to select a geometrically meaningful coordinate systems for this problem. Here we elaborate upon this model by adding an Ambrosio-Tortorelli scheme, which gives a physically meaningful interpretation for the minima obtained in the optimization process. In the case of a static scene, the optical flow can be factored into a model determined by the camera motion and an over-parameterized representation of the scene. The scene motion is described locally as a homography satisfying the epipolar constraint and parameterized by the equation of a local planar approximation of the scene. Assuming that the scene can be approximated by a piecewise smooth manifold, enforcing piecewise spatial smoothness on the homography parameters becomes an axiomatically justified regularization criterion which favors piecewise smooth planar regions. 2.2 Background 2.2. The Variational Framework In the variational framework for optical flow, brightness constancy and smoothness assumptions are integrated in an energy functional. Let (u(x, y, t), v(x, y, t)) denote the optical flow at pixel coordinates (x, y) and time t. Brightness constancy determines the data term of the energy functional where E D (u,v)= Ψ ( I 2 z), (2.) I z = I(x+u,y+v,t+ ) I(x,y,t) (2.2) 7

20 and Ψ(s 2 )= s 2 + ε 2 is a convex approximation of the L norm for a small ε. M(A,x,y,t) denotes a generic model of the optical flow at pixel(x,y) and time t, where A=(A i (x,y,t)) i {,...,n} is a family of functions parameterizing the model, i.e., ( ) u(x,y,t) = M(A,x,y,t). (2.3) v(x,y,t) We begin with the smoothness term proposed by Nir et al. in [47], E S (A)= Ψ ( n i= A i 2 ). (2.4) In order to refine the discontinuities and obtain a physically meaningful regularization, we extend the smoothness prior using the Ambrosio-Tortorelli scheme [3, 55]. E S,AT (A)= ( v AT ) 2 Ψ ( n i= A i 2 ) + ρ (v AT ) 2 + ρ 2 v AT 2, (2.5) where v AT is a diffusivity function, ideally serving as an indicator of the discontinuities set in the flow field. Choosing ρ = ρ 2 and gradually decreasing ρ 2 towards can be used to approximate the Mumford-Shah [45] model via Γ-convergence process, but we do not pursue this direction in this thesis. While the Ambrosio-Tortorelli scheme has been used in the context of optical flow [6, 9, ], in our case this seemingly arbitrary choice of regularization and segmentation has a physical meaning. The regularization of the flow becomes a segmentation process of the visible surface in the scene into planar patches, each with his own set of plane parameters. In addition, it helps us obtain accurate edges in the resulting flow. Furthermore, the generalized Ambrosio-Tortorelli scheme allows us to explicitly reason about the places in the flow where the nonlinear nature of the data manifold manifests itself. Suppose we have a piecewise-planar, static, scene, and an ideal solution (A,v AT ) where A is piecewise constant, and the diffusivity function v AT is at planar region boundaries and elsewhere. With such a solution, we expect two neighboring points which belong to different regions to have a very small diffusivity value v AT connecting them, effectively nullifying the interaction between different plane s parameters. Furthermore the cost associated with this 8

21 solution is directly attributed to the discontinuity set measure in the image,i.e., to. ρ (v AT ) 2 + ρ 2 v AT 2. (2.6) Thus, the proposed ideal solution becomes a global minimizer of the functional, as determined by the measure of discontinuities in the 2 2-D sketch [42]. This is directly related to the question raised by Trobin et al. [59] regarding the overparameterized affine flow model and its global minimizers. The complete functional now becomes: E(A)=E D (M(A,x,y,t))+αE S,AT (A). (2.7) In the remainder of this chapter, we will propose a motion model enforcing the epipolar constraint and show how to minimize the proposed functional Epipolar Geometry Let us introduce some background on epipolar geometry, so as to motivate the choice of the motion model. A complete overview can be found in [24, 29]. Given two views of a static scene, the optical flow is restricted by the epipolar constraint. Figure 2. shows that a pixel m in the left image is restricted to a line l called an epipolar line in the right image. All the epipolar lines in the left (resp. right) image go through e (resp. e ), which is called the left (resp. right) epipole. In projective geometry, image points and lines are often represented by 3D homogeneous coordinates x m= λ y λ Ê. (2.8) Image points and their corresponding epipolar lines are related by the fundamental matrix F l = F m. (2.9) Consider a plane π, visible from both cameras, and the planar homography H π which corresponds to the composition of the back-projection from the left view to a plane (π) and the projection from (π) to the right view (see Figure 2.). The 9

22 C M Hπ P m m π e e Figure 2.: Epipolar geometry. homography H π gives rise to a useful decomposition of the fundamental matrix l C P F =[e ] H π, (2.) where [e ] is a matrix representation of the cross product with e. 2.3 Estimation of the Fundamental Matrix One of the main challenges in estimating optical flow using the epipolar geometry is to retrieve an accurate and robust estimation of the fundamental matrix. Mainberger et. al. [4] showed that robustness of the fundamental matrix estimation could be achieved by using dense optical flow instead of applying RANSAC or LMedS methods to a sparse set of matches. Hence, we use as initialization the Horn-Schunck with Charbonnier penalty function optical flow implementation provided by Sun et al. [57], modified to use color images. This represents a baseline nonlinear optical flow method, as in [57]. In addition to allowing the computation of the fundamental matrix, this initialization also serves as a starting point for our optical flow computation algorithm. Many methods aimed at estimating the fundamental matrix can handle large numbers of correspondences. Among those, we choose a robust M-estimation method based on the symmetric epipolar distance, the implementation of which is made very efficient by the use of the Levenberg-Marquardt algorithm, as explained in [34].

23 2.4 A Flow Model Based on Local Homographies We now proceed to develop the model and motivation for the flow equations. Suppose the camera is calibrated, with projection matrices ( ) ( ) P(t)=P = I, P(t+ )=P = R t. (2.) where R is a rotation matrix and t is a translation vector expressing camera motion between the two consecutive frames at t and t+. We assume that locally, the scene is well approximated by the plane v T x+d = (2.2) where(x T,d) T =(x,y,,d) T denotes the 3D scene point visible at pixel x in homogeneous coordinates. The corresponding point of x at time t + is ( ) x x = P = Rx+td =(R tv T )x (2.3) d in homogeneous coordinates. v designates the normal of the local planar approximation of the scene, and (v T x) is the depth of the scene at time t. The planar homography expressed in (2.3) gives a geometrically meaningful motion model parameterized by v. From now on, consider v as a function of the pixel coordinates. Under the assumption that the scene can be approximated by a piecewise smooth manifold, v must be piecewise smooth. We now derive the motion parametrization. In general, the camera parameters are not known, but we can re-parameterize the planar homography using e and F. In the following derivation we assume a calibrated view for simplicity s sake. Let H(x, y, t) denote the planar homography motion model. We have H R tv T. (2.4) For any compatible planar homography H (cf. [29], 3..., we will provide a specific choice later on), (v, µ) : H = µ(r tv T ) (2.5) H=H µt(v v ) T. (2.6)

24 As t and e are parallel, we can also write H=H + e µe T t e 2 (v v ) T. (2.7) Hence, H(x, y, t) can be parameterized by the function so that A(x,y,t)= µe T t e 2 (v(x,y,t) v ), (2.8) H(x,y,t)=H + e A(x,y,t) T. (2.9) The parametrization A is the unknown field we want to compute in order to model and estimate the optical flow. The piecewise smoothness of A is a direct consequence of the piecewise smoothness of v, as testified by (2.8). More precisely, minimization of the Ambrosio-Tortorelli regularization term favors segmentation of the visible surface into planar patches where the data evidence permits it. When the cameras are not calibrated, the relationship between the parametrization A and v is still linear. In fact, the calibration matrices mainly affect the relative weighting of the model parameters smoothness. Our experiments show that even without controlling the relative smoothness of the model parameters, the optical flow can be estimated accurately. Note that the parametrization A can also be derived directly from the fundamental matrix decomposition (2.). For H, we can choose the special matrix H = S=[e ] F. (2.2) Each column of S with the corresponding column of F and e form an orthogonal basis of Ê 3 so that (2.) is satisfied. S is a degenerate homography which projects points in the left image to points of the line represented by e in the right image. Next, we use the notations x= x x 2 x 3, e = x e y e z e, H = T h T h 2 T h 3, (2.2) to signify the 3D point coordinates, the epipole s 2D homogeneous coordinates, 2

25 and the homography matrix rows, respectively. The parametrization of H is introduced into the expression of the optical flow ( ) ( ) ( ) M(A,x,y,t) = ( ) u h T x+x = λ e A T x x v h T 2 x+y e A T, λ = x y h T 3 x+z e A T x. (2.22) where x y are the corresponding pixels in the left image Euler-Lagrange Equations By interchangeably fixing A i,i=...n and v AT, we obtain the Euler-Lagrange equations which minimize the functional. Minimization with respect to A i. Fixing v AT, we obtain i, Ai (E D + αe S,AT )=. (2.23) the variation of the data term with respect to the model parameter function A i is given by EL(E D (u,v))=2ψ ( I 2 z) Iz Ai I z, (2.24) where and Ai I z = λ 2 x i (x e h 3 T x z e h T x)i + x + λ 2 x i (y e h 3 T x z e h 2 T x)i + y, (2.25) I + x = I x (x+u,y+v,t+ ) (2.26) I + y = I y (x+u,y+v,t+ ). (2.27) For the smoothness term, the Euler-Lagrange equations are 3

26 EL(E S,AT )= ( ( v AT ) 2)( Ψ ( n i= A i 2 )) T A i +( v AT ) 2 div (Ψ ( ) ) A j 2 A i j thus, the energy is minimized by solving the nonlinear system of equations Ψ ( Iz 2 ) Iz Ai I z α ( ( n )) T ( v AT ) 2)( Ψ A i 2 A i i= α( v AT ) 2 div Minimization with respect to v AT. Fixing A i, we obtain 2α( v AT )Ψ Implementation (Ψ ( ( n i= A i 2 ) (2.28) ) A j 2 A i )=. (2.29) j + 2ρ (v AT ) ρ 2 v AT = (2.3) Minimization with respect to v AT is straightforward, as the equations are linear with respect to v AT, therefore we will only elaborate on the minimization with respect to A i The nonlinear Euler-Lagrange equation minimizing A i, are linearized by adopting three embedded loops, similarly to [47]. First, the warped image gradient (I x +,I+ y ) is frozen, and so is λ. At each iteration k, we have where ( Ai I z ) k = x i d k (2.3) d k = (λ k ) 2 (x e h 3 T x z e h T x)(i + x )k +(λ k ) 2 (y e h 3 T x z e h 2 T x)(i + y )k, 4

27 and the following approximation is made using first order Taylor expansions where I k+ z The system of equations (2.29) becomes where Iz k + d k 3 k x i da i i= (2.32) da k = A k+ A k. (2.33) ) Ψ ( (Iz k+ ) 2)( Iz k + d k 3 k x j da j x i d k α ( ( n )) T ( v AT ) 2)( Ψ A k+ i 2 A k+ i i= ( ) ) α ( v AT ) 2 div (Ψ A k+ j 2 k+ A i =. (2.34) j A second loop with superscript l is added to cope with the nonlinearity of Ψ. (Ψ ) k,l (Ψ ) k,l Data ( I k z + dk j= ) 3 k,l+ x j da j x i d k j= α ( ( n )) T ( v AT ) 2)( (Ψ ) k,l Smooth A k+ i 2 A k+ i i= ( α ( v AT ) 2 div (Ψ ) k,l Smooth A k,l+) i = (2.35) Data = Ψ ( I k z + d k 3 i= x i da i k,l ) 2, (Ψ ) k,l Smooth = Ψ ( A k,l j ). 2 j At this point, the system of equations is linear and sparse in the spatial domain. The solution A, as well as the diffusivity term v AT are obtained through Gauss- Seidel iterations. In the case of the Ambrosio-Tortorelli regularization term, the diffusion term of the equation is modulated by v AT. 5

28 2.5 Experimental results We now demonstrate motion estimation results using our algorithm, both visually and in terms of the average angular error (AAE). No post-processing was applied to the optical flow field obtained after energy minimization. The algorithm was tested on image pairs from the Middlebury optical flow test set [8], as well as all images with a static scene and publicly available ground truth optical flow from the training set. Results from the training set are presented in Table 2.. The flow, parameters, and diffusivity field resulting from our method are presented in Figure 2.3. The optical flow is shown with color encoding and a disparity map. Results from the test set are shown in Figure 2.2. A smoothness parameter α = 4 was used in all experiments, and the Ambrosio-Tortorelli coefficients were set to ρ = 2,ρ 2 = 5 5. The proposed method produced the best results to date on the static Yosemite and Urban scenes. The algorithm is not designed, however, for non-static scenes, where the computed epipolar lines have no meaning. One possible solution to this shortcoming is to return to a 2D search [6]. Such a combined approach is left for future work. In the Teddy and Grove test images, the initialization of our algorithm introduced errors in significant parts of the image, which our method could not overcome. This behavior is related to the problem of finding a global minimum for the optical flow, which is known to have several local minima. Improving the global convergence using discrete graph-based techniques, has been the focus of several papers (see [35, 38, 32], for example), and is beyond the scope of this work. We expect better initialization to improve the accuracy to that of the Yosemite and Urban image pairs. Our optical flow estimation for the Yosemite and Urban sequences gives the best results to date, achieving an AAE of.25 for the Yosemite sequence test pair and 2.38 for the Urban sequence, as shown in Figure 2.2. When the fundamental matrix estimate was improved (by estimating from the ground truth optical flow), we reduced the AAE to.66 for Yosemite! It is interesting to look at the results obtained for scenes with planar regions, such as the Urban2 (Figure 2.3) image pair. In Urban2, the scene is composed of many planar patches, modeled by constant patches in the model parameters. In both these scenes, as well as others, the resulting diffusivity field clearly marks the contours of planar regions in the image such as the buildings in Urban2 and the tree and soil ridges in Grove2. 6

AAE STD Grove2 2.4 7.6 Grove3 5.53 5.76 Urban2 2.5 9.22 Urban3 3.84 6.88 Venus 4.29 2. Yosemite.85.24 (a) Middlebury training set Method AAE Brox et al. [7].

85 (b) Yosemite sequence Table 2.: AAE comparison for static scenes of the Middlebury training set and for the Yosemite sequence Figure 2.

29 AAE STD Grove Grove Urban Urban Venus Yosemite (a) Middlebury training set Method AAE Brox et al. [7].59 Mémin/Pérez [44].58 Bruhn et al. [8].46 Amiaz et al. [7].44 Roth/Black [5].43 Valgaerts et al. [6].7 Nir et al. [47].5 Our method.85 (b) Yosemite sequence Table 2.: AAE comparison for static scenes of the Middlebury training set and for the Yosemite sequence Figure 2.2: Average angular error values of our algorithm, compared on the middlebury test set. The smoothness coefficient was set to α = 4 in all experiments. Red marks the row of the suggested algorithm. 7

Thesis MSC-2-9 - 2 (a) Grove2 (b) Optical flow

Optical flow estimation (h) Parameter field (d)

Disparity estima-(j) Diffusivity function tion

3: Grove2 and Urban2 sequence results 2.

computation was presented, which hinges on a

should have a strong theoretical foundation.

retrieves meaningful local motion parameters

At each pixel, the parameters provide an

manifold, up to a fixed shift and scale.

level output than optical flow in the computer

An interesting aspect of our energy functional,

given a carefully selected over-complete

30 Technion - Computer Science Department - M.Sc. Thesis MSC (a) Grove2 (b) Optical flow estimation (c) Parameter field (f) Urban2 (g) Optical flow estimation (h) Parameter field (d) Disparity estima-(e) Diffusivity func- (i) Disparity estima-(j) Diffusivity function tion tion tion Figure 2.3: Grove2 and Urban2 sequence results 2.6 Conclusions A new method for optical flow computation was presented, which hinges on a guiding principle that optic flow regularization should have a strong theoretical foundation. The method is applicable to static scenes and retrieves meaningful local motion parameters related to the scene geometry. At each pixel, the parameters provide an estimation of the plane tangent to the scene manifold, up to a fixed shift and scale. To that extent, they can be seen as a higher level output than optical flow in the computer vision hierarchy. An interesting aspect of our energy functional, which was already mentioned in [47], is that given a carefully selected over-complete parameter field, the different parameters support each other to find a smooth piecewise constant parameter patches, while the incorporated Ambrosio-Tortorelli scheme prevents diffusion across discontinuities. Furthermore, the Ambrosio-Tortorelli scheme allows us to 8

31 combine regularization and segmentation, resulting in a physically meaningful regularization process, while minimizing the dependency on the relative scaling of the coefficients. Finally, although the performance demonstrated already goes beyond the latest published results, there is still much gain to be expected from better fundamental matrix estimation and algorithm initialization. In addition, when more than two frames are available and the camera pose is known, augmenting the model with time-smoothness is expected to systematically improve the results. 9

32 Chapter 3 On Globally Optimal Local Modeling: From Moving Least Squares To Over-Parametrization 3. Introduction We return here to the problem of recovering a signal from its noisy and distorted samples. Assuming we know some parameterized model describing the signal domain, the recovering problem becomes a fitting problem, where we wish to estimate the signal s correct parameters. This problem, also known as signal modeling, is a fundamental problem in both image and signal processing. One of the widespread and successful methods for local signal modeling, is the celebrated moving least squares local fitting method (MLS), which has evolved to become an important tool in both image and signal processing and in computational graphics. In [37], Levin explored the moving least-squares method and applied it to scattereddata interpolation, smoothing and gradient approximation. In [36, 2, 5] the moving least squares technique was employed for modeling surfaces from point-sampled data, and proved to be a powerful approach. This was followed by the work of Fleishman et al. [2, 25], incorporating robust statistics mechanisms for outlier removal. Common to these works is the locality of the fitting procedure and the lack of global assumptions expressing prior knowledge on the variations of local 2

33 parameters. The aim of this chapter is to show how one can design variational functionals that exploit local fitting of models and global smoothness assumptions on the variations of model parameters, that are natural for various types of signals. As mentioned in the introduction, the over-parametrization variational methodology combines local fitting methods with global priors on parameter variations. The local modeling relates to a wealth of classical methods, such as Haralick s and Watson s facet model for images [28] and extends them in many ways. More importantly, the discussion and experimental results, reported in this chapter, point at a rather general methodology for designing functionals for variational model estimation and signal reconstruction. Focusing on the denoising problem is merely an illustrative test case. 3.2 The local modeling of data For the sake of simplicity, we shall here limit our discussion to one dimensional signals and address the generic problem of denoising. Let f(x) be a one dimensional signal and f noisy (x)= f(x)+n(x) be it s noisy counterpart. Also, denote by { f j = f(x j )+n(x j ) } the set of samples of the noisy signal. Suppose that f can be (locally) described by a parameterized model of the form f(x)= n i= A i φ i (x) (3.) where A={A i } n i= is a set of parameters, and φ ={φ i} n i= is a set of basis signals. Local modeling is the process of estimating A={A i } from the noisy signal f noisy or it s samples { } f j, in the neighborhood of a point x. As a simple example, we can consider the Taylor approximation as a parameterized model with polynomial basis functions φ i = x i. Suppose we would like to use this local modeling for denoising our noisy data f noisy (x) or { } f j. Then, around x=x, we want to estimate A i (x ), i.e., the parameters of the model (3.), by solving: n arg min f noisy(x ) A i φ i (x ), (3.2) [A,A 2,...,A n ] in some local neighborhood of x and a distance norm. This minimization gives i= 2

34 us the best local estimate of ˆf(x ) ˆf(x )= n i= A i (x )φ i (x ). (3.3) Repeating this process for every location x give us the moving best estimate of f. The choice of the distance or measure of error is of a great importance. One common choice is the weighted least squares distance, as considered, for example, by Farnebäck [23], as a generalization to the facet model: f noisy(x) n i= ( A i (x)φ i (x) = f noisy (y) w n i= A i (x)φ i (y)) 2 w(y x)dy. (3.4) This is essentially a weighted L 2 norm, where w( ) is a weight function which localizes the estimation of the parameters A i (x ) (This error measure can also be applied to a sampled signal). Both the continuous and the discrete cases, described above, yield a process to compute the local model parameters A i (x) and therefore for estimating the signal ˆf(x), not only at x but at all x. This process is the wellknown moving least squares estimation process. We wish to emphasize that although our proposed methodology, discussed in the next section, focuses on the L 2 norm, this is by no means the only choice of error measure. The choice of the error measure may be adapted to each particular problem of interest. Other measures, such as the least sum of absolute values distance [58] or L norm, can readily be substituted into the cost functionals. 3.3 Global priors on local model parameter variations In the previous discussion we did not impose any conditions on the model parameters A i, and in the definition of the neighborhood around x, via the weight functions, we did not use any idea of adaptation to the data. Suppose now that we are given some structural knowledge of the signal f(x). We would like to use this knowledge to improve the estimation process. For example, suppose we a priori know that f(x) is a piecewise polynomial signal over a set 22

35 of intervals, i.e., we know that: f (x)= n i= A r i x i f or x [x r,x r+ ] (3.5) but we do not know the sequence of breakpoints {x r }. Using polynomial basis functions [,x,x 2,...,x n ], we know a priori that a good estimate for f(x) may be provided by piecewise constant sets of parameters {A}, over [x r,x r+ ] segments, and changes in the parameters occur only at the breakpoints {x r }. Such knowledge provides us the incentive to impose some global prior on the parameters, such that the minimum achieved by the optimization process, will indeed favor them to be piecewise constant. This may be achieved by supplementing the moving least squares local fitting process with constraints on the variations of the parameters in the minimization process. Thus, we shall force the estimated parameters not only to provide the best weighted local fit to the data, but also to be consistent with the local fitting over adjacent neighborhoods. This is where we diverge and extend the facet model, which assigns the basis function s parameters at each point, using solely a local least square fit process. In this chapter, the assumed structural knowledge of the signal implies that good estimates can be achieved by a piecewise constant model i.e., a model whose parameters are piecewise constant. Therefore the focus of this work will be to design functionals which also impose global priors on the model parameters. We shall demonstrate that one can design a functional that does indeed fulfil this requirement. The mix of local fitting and global regularity is the main idea and power behind over-parameterized variational methods and is what makes them a versatile problem solving tool. By adapting the local fitting process and incorporating the global prior (in a way that will be described in the following sections), this methodology can be readily applied to address problems in various domains. 3.4 The over-parameterized functional In [46] Nir and Bruckstein presented a first attempt at noise removal based on an over-parameterized functional. This functional was, similarly to many well known functionals, a combination of two terms, as follows: E( f,a)=e D ( f,a)+αe S (A). (3.6) 23

36 Here α, is some fixed relative weight parameter, E D is a data or fidelity term, and E S is a regularization term. The data term E D was chosen to: E D ( f,a)= ( f noisy (x) n i= A i (x)φ i (x)) 2 dx. (3.7) Note that this functional implies a neighborhood of size in the distance measure (3.4), which means that this data term only penalizes for point-wise deviations of the estimated signal via n A i (x)φ i (x) from f noisy. i= The smoothness or regularization term, E S, was defined to penalize variations of the parameters A i as follows where Ψ ( s 2) = s 2 + ε 2. E S (A)= Ψ ( n i=a i (x)2 ) dx (3.8) The resulting functional yields a channel-coupled total variation (TV) regularization process for the estimation of the model parameters. Note that (3.8) is an approximated L type of regularizer (sometimes referred to as the Charbonnier penalty function). This regularizer causes the functional to be more robust to outliers, and allows for smaller penalties for high data differences (compared to a quadratic regularizer), while maintaining convexity and continuity [6, 43]. The regularization term was designed to impose the global prior on the parameters. It is channel-coupled to encourage the parameters to change simultaneously, thus preferring a piecewise constant solution as described in Section 3.3. In our experiments (which are discussed in Section 3.7), as well as in [46], this functional displayed good performance for noise removal compared with Rudin Osher and Fatemi s [53] classical total variation noise removal functional. A similar functional, with data term modifications, was used by Nir et al. in [47] for optical flow estimation, producing state of the art results The over-parameterized functional weaknesses Despite the good performance displayed by the over-parameterized functional, it still lacks with regard to the following shortcomings, that were clear in our experiments: 24

37 Discontinuities smearing: As mentioned, the regularization term is an approximate L regularizer. A precise L regularizer is indifferent to the way signal discontinuities appear, i.e., the same penalty is given to a smooth gradual signal change, and to sharp discontinuities (as long as the total signal difference is the same). See for example Pock s PhD work [48] for a detailed example. We consider this property as a shortcoming, because we expect the reconstructed parameters to be piecewise constant, where discontinuities appear as relatively few and sharp changes, hence this regularizer does not convey the intended global prior on the parameters, and does not prefer a truly piecewise constant solution. In practice the problem is even more severe: first the selection of ε constant, in the Charbonnier penalty function, proves to be problematic. Choosing a bigger ε value causes the functional to lose the ability to preserve sharp discontinuities and actually prefers to smooth out discontinuities. On the other hand, choosing a smaller ε value degenerates the penalty function. In fact, for any choice of ε, this penalty function will tend to smooth sharp discontinuities. Second, as discussed above the TV-L model suffers from the so called staircasing effect, where smooth regions are recovered as piecewise constant staircases in the reconstruction. See the work of Savage et al. [54] and references therein, for a detailed review of such effects. Origin biasing: The over-parameterized functional s global minimum may depend on the selected origin of the model. A detailed example, regarding the optical flow over-parameterized functional with affine flow basis functions, is given in the work of Trobin et al. [59]. We also note that the data term presented above only penalizes for point-wise deviation form the model, hence it imposes only a point-wise constraint on the functional s minimum, relying only on the regularization term to impose the global constraint. A discussion why this is a problematic issue is given in Section Overall, it is evident, that despite producing good results when applied to various applications, the over-parameterized functional model is fundamentally flawed when attempting to accomplish parameter reconstructions. On one hand the overparameterized model provides a solution domain wider than the TV model, for the functional to choose from, thus often enabling convergence to excellent denoising solutions, on the other hand the constraints applied to the solution domain, through the functional, are not strong enough as to impose convergence to 25

38 piecewise-constant-parameters solution, as demonstrated in Section The non-local over-parameterized functional To overcome the shortcomings described in Section 3.4., we shall modify the functional, both in the data term and in the regularization term, as described below 3.5. The modified data term: a non-local functional implementing MLS In order to overcome the point-wise character of the data term, and to impose a neighborhood constraint in the spirit of (3.4) in the data term, we extend it to what is commonly referred to as a non-local functional [4, 33, 26]. This is done simply by means of defining a weighting function which considers more then point-wise differences. Making the data term a non-local functional, requires the parameters to model the signal well over a neighborhood of each point. We note that the robustness of the parameters estimate increases with the size of the support set. On the other hand, increasing the size of the support set too much, may reduce the functional s ability to detect discontinuities and to preserve them. The non-local data term functional is E D = x y ( f noisy (y) n i= A i (x)φ i (y)) 2 w(x,y)dydx (3.9) and it conforms to the functional form of the weighted least squares fit distance defined in Section 3.2. Note that there is yet no restriction on the size or shape of the support set around each point that is induced by the weighting function w(x,y). For demonstration purposes, in this work we defined a simple sliding window weighting function, as described in Section The search for optimal data dependent weighting functions is an interesting possibility and may be the subject of future research The modified regularization term We extend the over-parameterized regularization term, using the Ambrosio-Tortorelli (AT) scheme [3, 55], as described in Section For the sake of completeness 26

39 we repeat here the AT regularization term: E S,AT = ( v AT ) 2 Ψ ( n i= A i 2 ) + ρ (v AT ) 2 + ρ 2 v 2 AT. (3.) As stated in Section 2.2., the AT scheme allows the regularization term to prefer few parameter discontinuities and to prevent discontinuities from smearing into neighboring pixels via the diffusion process, thus allowing piecewise smooth solution. This directly addresses the discontinuities smearing effect described in Section Also, the AT regularization addresses the origin biasing effect, described in Section 3.4., by making the functional much less sensitive to the selected origin Effects of the proposed functional modifications An obvious question arises: why do we need to modify both data and regularization terms? To answer this, we first notice that using only the non-local data term improves the local parameter estimates, but cannot prevent the discontinuities smearing effect. A moving least squares process, with small window, will yield excellent parameter estimates inside the constant-model segments, but at boundaries will combine data from neighboring segments, thus smoothing and blurring the estimates over intervals determined by the window size. Therefore we need to add the AT scheme to the functional. But then one might expect that the AT scheme without the non-local data term would suffice, by segmenting the regularization into piecewise constant regions, and relying on the data term and the global regularization to recover the correct parameters for each segment. In practice this is not the case. Consider a discontinuous signal y s, as depicted in Figure 3., and suppose we initialize the model parameters to a smoothed version of y s. We will now discuss applying different functionals for reconstructing the model parameters from y s. 27

40 Original Smoothed Figure 3.: Solid line - y s a piecewise linear signal with one discontinuity. Dashed line - a smooth version of y s with which we initialized the model parameters Figures 3.2 and 3.3, depict snapshots of the reconstructed signal, parameters, and the resulting AT indicator function v AT at different time steps of the minimization process. We used α =.5 and α = respectively. These are snapshots of the minimization process of the point-wise over-parameterized functional, modified with the AT regularization term (MOP). Both minimizations were carried out until convergence. It is evident that the reconstructed parameters are not piecewise constant, as we would like to have from our prior information on the signal. Also, it is important to note that in the first series the signal is perfectly reconstructed, effectively nullifying the data term, and in the second series the signal is smoothed, however the data term energy is still very low. In contrast, Figure 3.4 depicts a series of snapshots of the minimization process of the non-local over-parameterized functional (NLOP). Note that here the parameters are perfectly reconstructed, the AT indicator function v AT receives a value close to one only in the vicinity of the discontinuity and also the signal is perfectly reconstructed. Assuming that the result achieved by the NLOP functional is very close to the global minimum of both functionals, we calculated, using interpolation, hypothetical solutions from the solution achieved by the MOP functional to the global minimum. We then calculated the energy of the MOP functional on each step of the interpolation. It becomes obvious that the energy of the functional is raising before dropping to the energy level of the global minimum. Separate calculation of the energy of the data and the regularization terms, indicates that most of the functional energy is concentrated in the regularization term. In the transition between the local and global minimum solutions, the regularization term energy raises and dictates the total energy change, while the data term contribution is negligible. 28

41 The solutions to which the MOP functional converges and energy considerations, lead us to the conclusion that the MOP functional is converging into a local minimum solution. This trapping effect is alleviated in the NLOP functional, where the presumed local minimum, achieved by the MOP, is no longer a local minimum. This is due to the fact that it is much more difficult to drive the energy of the non-local data term close to zero and it contributes significantly to drive the parameters toward their correct values. Thus, the minimization process does not halt and continues toward the global minimum V AT A A Figure 3.2: From left to right, snapshots at various times, of the MOP functional with relative weight of E s α =.5. The top image displays the reconstructed signal and the v AT indicator function. The bottom image displays the reconstructed parameters. y 29

42 2 4 6 Technion - Computer Science Department - M.Sc. Thesis MSC V AT A A Figure 3.3: From left to right, snapshots at various times, of the MOP functional with relative weight of E s α = y V AT A A Figure 3.4: From left to right, snapshots at various times, of the NLOP functional. y 3

43 3.5.4 Euler-Lagrange equations Once we designed the functionals to be minimized, by interchangeably fixing the A i (x),i=...n and v AT (x) functions, we readily obtain the Euler-Lagrange equations which characterize the minimizers of the functional. Minimization with respect to A q (x),q=...n (parameter minimization step). Fixing v AT (x), we obtain q, Aq E D d dx ( A q αe S,AT ) =. (3.) the variation of the data term with respect to the model parameter functions A q (x) is given by Aq E D = 2 y ( f noisy (y) n i= A i (x)φ i (y) For the smoothness term, the Euler-Lagrange equations are ) φ q (y)w(x,y)dy. (3.2) ( d ) n ( A αe q S,AT = 4α( v AT )v ATΨ A dx i )A 2 q i= ) ( +2α( v AT ) 2 d (Ψ dx A i )A 2 q j (3.3) thus, the energy is minimized by solving the following nonlinear system of equations at each point x, q=...n 2 y ( f noisy (y) n i= A i (x)φ i (y) ) φ q (y)w(x,y)dy ( n 4α( v AT )v AT Ψ A i )A 2 q i= ) ( 2α( v AT ) 2 d (Ψ dx A i )A 2 q j =. (3.4) 3

44 Minimization with respect to v AT (x) (AT minimization step). Fixing the functions A i (x),i=...n, we obtain 3.6 Implementation 2α( v AT )E s + 2ρ (v AT ) ρ 2 (v AT )=. (3.5) We used central first and second derivatives and reflecting boundary conditions. In all the methods, we used various α,ρ and ρ 2 constants, depending on the noise level, as is common in noise reduction methods (this was done for all the considered algorithms, with the appropriate parameters). In all the examples, we assumed a sampling interval of dx =. For the minimization process we used gradient descent with 2 iterations. This minimization method is a notoriously slowly converging method, but it is fast enough for our D example. We performed an AT minimization step every parameter minimization steps, and updated the weighting function every parameter minimization steps. We used a window size of sample points both for the NLOP functional, and for the weighted least square fit method (used for initialization and reconstruction). We note that using a bigger window size resulted in significantly superior results on all tests, but this may not be the case in other signals. Choosing too big a window may cause an overlap between adjacent discontinuities and prevent the functional from correctly recovering them Weighting scheme In this work we defined the following adaptive sliding window weighting function. For each point x of the signal, we choose a window W of length N such that, x W and r(x) = N r(x,y j ) is minimal, where y j W and r(x,y j ) is the non-weighted j= least squares fit distance at x: r(x,y j )= ( f noisy (y j ) n i= A i (x)φ i (y j )) 2. (3.6) Specifically, we chose a successive set of points of size N which include x and also minimizes r(x). For example, for a window of size 5, there are 5 window possibilities as described below: 32

45 W W2 W3 W4 W5 x-5 x-4 x-3 x-2 x- x x+ x+2 x+3 x+4 x+5 Figure 3.5 We mark the chosen window by w x, and the selected weight function will then be: N i f y w x w(y x)= (3.7) otherwise By no means do we claim that this is the best choice of the weighting function. This is but one possible, adaptive weighting function selection process, that enhances the data term to impose a non point-wise constraint, and was found to yield good results Initialization In order to prevent trapping into local minima, we initialize the model parameters from the noisy data by means of a robust MLS fitting i.e., we compute each point s parameters by choosing the best least square fitting approximation via a sliding window least square calculation. This was done in exactly the same manner as the sliding window weighting function described in Section The parameters chosen are those which generated the minimal reconstruction error. This computation provided a robust initialization, which already preserves, to some extent, discontinuities in the signal. This claim is strengthened by experiments we performed comparing with a regular moving least square fitting initialization. We found that with the robust initialization, the functional converges to a better solution, and nicely exposes the true discontinuities in the signal models. 3.7 Experiments and results We conducted various experiments in order to verify the performance of the proposed functional. In this work we focus mainly on D examples leaving 2D ex- 33

46 tensions to images for future publications, nevertheless we exhibit some initial experiments conducted on 2D synthetic images which yield a good idea of the performance to be expected in the 2D case D experiments We begin with the selection of the basis functions. We consider linear basis functions of the form: { φ = (3.8) φ 2 = x This seemingly simple choice of functions, enables us to make comprehensive tests of the functional performance. Under this choice, the functional is expected to have the best performance on piecewise linear signals. We note that this is an arbitrary choice of basis functions, and one should choose other basis functions appropriate for the signal domain. To perform the tests we devised a set of synthetic D signals, and added white Gaussian noise with standard deviation ranging from. up to.75. These signals can be separated to two main groups: The first group is comprised of noisy piecewise linear signals. This group is interesting because it enables us to test the parameter reconstruction performance as well as noise reduction capabilities, under the optimal basis functions. The piecewise linear signals are depicted in Figure 3.6. The second group is comprised of noisy nonlinear signals, such as higher degree polynomials. This group is interesting only with regard to the noise removal performance, because generating the ground truth parameters for linear basis functions is problematic and we do not expect the linear parameters to be piecewise constant. Naturally, we could choose appropriate basis functions for these signals too, but we wanted to demonstrate that our functional preforms surprisingly well, even when applied with suboptimal basis functions. The nonlinear signals are depicted in Figure 3.7. In order to the check the performance of NLOP functionals, and to test them against other denoising algorithms, we also implemented the following noise removal algorithms. The first two are the classic TV functional and the original over-parameterized functional (OP). We chose these functionals due to their relation to the NLOP functional, enabling us to show the improvement that the NLOP 34

47 has on predecessor functionals. The third and final algorithm, is the state of the art K-SVD noise removal algorithm, firstly proposed by Aharon et al. []. We used implementation published by Rubinstein et al. [52]. We compared the various algorithms noise reduction performance and, more importantly, we compared their parameter reconstruction capability. For the latter comparison, we reconstructed parameters from the K-SVD denoised signal, using our robust least square fitting method (see Section 3.6.2), and compared the results with both our NLOP functional and the original OP functional (a) One jump (b) One discontinuity (c) Many jumps & discontinuities Figure 3.6: The piecewise Linear test signals and their STD.5 noisy counterparts. 35

48 (a) Polynomial (b) Trig Figure 3.7: The nonlinear test signals and their STD.5 noisy counterparts. Left: a polynomials signal of degree 2. Right: a signal of combined sine and cosine functions D Results Noise removal performance testing, was done by comparing the L 2 norm of the difference between the cleaned reconstructed signal and the original signal i.e., n E L2 = f(x) A i (x)φ i (x). (3.9) i= 2 Parameters reconstruction performance testing (which with the linear basis function, is only relevant on the piecewise linear signals), was done by calculating the L 2 norm of the difference between the reconstructed parameters and the original parameters, with whom we generated the signal. Figure 3.8 depicts an example of the recovered parameters and the v AT indicator function obtained for the Many jumps & discontinuities signal. Figures , displays graphs comparing performance of the various algorithms on piecewise linear signals. The left graphs display the noise removal error norms, while the right graphs display parameters reconstruction error norms. 36

49 Figure 3.8: Example of parameters reconstruction. Note that the indicator function tends to where the signal has parameters discontinuities and tends to in almost any other location. On various noise removal performance graphs we can see the excellent performance of both OP and NLOP functionals, reinforcing the claims of outstanding noise removal performance obtained by the OP functional and maintained by our new functional. Also, we can see that when the signal contains a discontinuity, such as in the One discontinuity signal (as apposed to a continuous signal with only parameters discontinuity such as the One jump signal), the NLOP functional has greater ability to cope with the discontinuity, thus generating better results than the OP functional. In Figures 3.4,3.5 we display the noise removal comparison of all the algorithms on the One discontinuity and the Many jumps & discontinuities signals. When considering parameters reconstruction performance, we see a totally different picture. We see that on most cases, particularly in signals which contains signal discontinuity, the NLOP functional and the K-SVD algorithms both outperform the OP functional. This result demonstrates our claim, in Section 3.4., that the OP functional lacks the possibility to well enforce a global prior on the reconstructed parameters. Figure 3.6 compares the reconstruction results of NLOP functional, OP functional and the reconstruction from the denoised K-SVD signal, on the One discontinuity signal. Note the reconstruction of NLOP functional is close to a piecewise constant solution, while the OP reconstruction is seemingly a smoothly changing function. In the K-SVD reconstruction, where at each point a set of parameters is chosen regardless of the choice made for it s adjacent neighbors, the lack of influence of enforcement of a global constraint is evident. 37

50 error norm Noise level (a) One jump signal non-local op ksvd tv error norm Noise level (b) One jump parameters non-local op ksvd Figure 3.9: Signal noise removal and parameters reconstruction comparison for the One jump signal. error norm Noise level (a) One discontinuity signal non-local op ksvd tv error norm. non-local op 8. ksvd Noise level (b) One discontinuity parameters Figure 3.: Signal noise removal and parameters reconstruction comparison for the One discontinuity signal. error norm Noise level (a) Many jumps & discontinuities signal non-local op ksvd tv error norm Noise level non-local op ksvd (b) Many jumps & discontinuities parameters Figure 3.: Signal noise removal and parameters reconstruction comparison for the Many jumps & discontinuities signal. We note that the K-SVD algorithm, was was designed for 2D images with 38

51 far more samples then in the described experiments. Although producing excellent denoising results on these signals, these setting are far from optimal for the K-SVD algorithm. Therefore we performed additional tests on a subset of the signals which were sampled in a much finer grid of around 2 samples per signal, on which the K-SVD has better ability to perform. These extra experiments were performed to ascertain that the reported results are not due too unappropriate settings. As Figures 3.2 and 3.3 show even on the highly sampled signals the same overall performance is achieved, thus confirming our results. error norm Noise level non-local ksvd (a) Highly sampled One discontinuity signal error norm Noise level non-local ksvd (b) Highly sampled One discontinuity parameters Figure 3.2: Signal noise removal and parameters reconstruction comparison for the highly sampled One discontinuity signal. error norm Noise level non-local op ksvd Noise level (a) Highly sampled Many jumps & discontinuities(b) Highly sampled Many jumps & discontinuities signal parameters error norm non-local ksvd Figure 3.3: Signal noise removal and parameters reconstruction comparison for the highly sampled Many jumps & discontinuities signal. 39

52 Figure 3.4: Comparison of noise removal on one discontinuity signal with noise STD =.375. Depicted are the residual noise images. Top left: TV. Top right: OP. Bottom left: K-SVD. Bottom right: NLOP true parameters NLOP OP (a) NLOP vs OP true parameters NLOP K SVD (b) NLOP vs K-SVD Figure 3.6: This figure compares the reconstructed second parameter on the various algorithms, when denoising the One discontinuity signal. Left image: Comparison of parameter reconstruction between NLOP functional and OP functional on one discontinuity signal. Note how far the OP reconstruction is from a piecewise solution, while generating an excellent denoising result (seen in the relevant graph). Right image: Comparison of parameter reconstruction between NLOP functional and K-SVD algorithm on one discontinuity signal. Note the apparent lack of global constraint on the parameters in the K-SVD reconstruction. 4

53 Figure 3.5: Comparison of noise removal, on the many jumps & discontinuities signal with noise STD =.5. Depicted are the residual noise images. Top left: TV. Top right: OP. Bottom left: K-SVD. Bottom right: NLOP. In Figure 3.7, we compare the noise removal performance on the nonlinear signals. We can see that the NLOP functional still exhibits the best performance but it is not unchallenged by the OP functional. This is due to the strong constraints the NLOP functional has in trying to enforce the linear basis functions, i.e., trying to find a piecewise linear solution suitable for the given signal. error norm Noise level (a) polynomial signal non-local op op ksvd tv error norm. non-local op op.8 ksvd tv Noise level (b) trig signal Figure 3.7: Signal noise removal comparison of the nonlinear signals. Indeed, in order to see that our functional performance is not restricted by the basis function, and to verify that indeed better performance is achieved if we choose a better set of basis functions to model the signal domain, we performed several tests with higher degree polynomials. We display in Figure 3.8, results 4

54 achieved by denoising the polynomial signal displayed in Figure 3.7, while changing the NLOP functional basis functions to a 2nd degree polynomial. In general we expect a polynomial signal to be best recovered by polynomial basis functions of the same degree. This is clearly depicted in the graph displayed in Figure 3.8, where we see better performance by NLOP with polynomial basis function compared to NLOP with linear basis functions. error norm Noise level non-local op non-local op poly Figure 3.8: Comparison of noise removal performance on the polynomial signal. In this figure we compare performance of the NLOP functional with linear basis functions (marked by non-local OP), and NLOP functional with polynomial basis functions (marked by non-local OP poly). Another test we performed with polynomial basis functions, was on a C continuous 2nd degree polynomial displayed in Figure 3.9. This is an interesting case, as both this signal and it s first derivative are continuous, and only the second derivative is discontinuous. We found that this signal proved challenging for the MLS initialization method, causing it to misplace the point of discontinuity by several points. This initialization error was not detected by the NLOP functional, which maintained it throughout the minimization process, as displayed for example in Figure 3.2. The location of the discontinuity point depends on the random noise. We wish to emphasize that the reconstructed solutions achieved by the minimization of the NLOP functional have piecewise constant reconstructed parameters, which generate a piecewise smooth polynomial solution. This reconstructed signal may as well be the signal from which the noisy signal was generated. Also, the solution achieved by the NLOP functional outperformed the K-SVD method, as displayed in the graphs in Figure

55 Technion - Computer Science Department - M.Sc. Thesis MSC Figure 3.9: C continuous 2nd degree polynomial. Point of the second derivative discontinuity is marked by a red cross. From left to right: clean signal,. STD noise,.375 STD noise y AT y AT y AT Figure 3.2: C continuous 2nd degree polynomial reconstruction. Displayed are the reconstructed signals and the v AT indicator functions. (point of the second derivative discontinuity is marked by a red cross). From left to right clean signal,. STD noise,.375 STD noise Figure 3.2: C continuous 2nd degree polynomial reconstruction. Displayed are the reconstructed parameters (point of the second derivative discontinuity is marked vertical black line). From left to right clean signal,. STD noise,.375 STD noise. 43

56 error norm Noise level (a) Signal non-local op ksvd error norm Noise level (b) Parameters non-local op ksvd Figure 3.22: Signal noise removal and parameters reconstruction comparison for the C continuous 2nd degree polynomial D example We also ran initial tests on a 2D example. The non-local over-parameterized method extends naturally to higher dimensions. We implemented the 2D case in a similar manner to the D case, and chose a linear basis functions of the form φ = φ 2 = x (3.2) φ 3 = y In Figure 3.23, we show the 2D example, the NLOP functional noise removal result. In Figure 3.24, we display the parameters which were reconstructed in the minimization process and the generated v AT indicator function. We can see that the v AT indicator function managed to segment the signal in a good manner, although still delineated some ghost segments especially near the image edges. Note that the recovered parameters are almost piecewise constant as expected. 44

2 4 6 8 Technion - Computer Science Department - M.Sc. Thesis MSC-2-9 - 2 2 2 3 3.

5 9 9 2 4 6 8 (a) Original image (b) Noisy image.2.5 2 2.

.2 2 4 6 8 2 4 6 8 (c) Denoised image (d) Residual noise

23: A 2D noise removal example of the 2D NLOP functional.

5 9 9 2 2 2 4 6 8 2 4 6 8 (a) Parameter (b) Parameter 2.2.9 2.8 2 3.

57 Technion - Computer Science Department - M.Sc. Thesis MSC (a) Original image (b) Noisy image (c) Denoised image (d) Residual noise Figure 3.23: A 2D noise removal example of the 2D NLOP functional (a) Parameter (b) Parameter (c) Parameter 3 (d) v AT Figure 3.24: Reconstructed parameters and v AT AT scheme indicator function. 45

Technion - Computer Science Department - Tehnical Report CIS

Technion - Computer Science Department - Tehnical Report CIS Over-Parameterized Variational Optical Flow Tal Nir Alfred M. Bruckstein Ron Kimmel {taln, freddy, ron}@cs.technion.ac.il Department of Computer Science Technion Israel Institute of Technology Technion