Discriminative Filters for Depth from Defocus

Similar documents
Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means

Learning Shape from Defocus

A New Approach to 3D Shape Recovery of Local Planar Surface Patches from Shift-Variant Blurred Images

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method

1 Surprises in high dimensions

Shift-map Image Registration

Multilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis

6 Gradient Descent. 6.1 Functions

Online Appendix to: Generalizing Database Forensics

Exercises of PIV. incomplete draft, version 0.0. October 2009

Shift-map Image Registration

New Geometric Interpretation and Analytic Solution for Quadrilateral Reconstruction

Exploring Context with Deep Structured models for Semantic Segmentation

CAMERAS AND GRAVITY: ESTIMATING PLANAR OBJECT ORIENTATION. Zhaoyin Jia, Andrew Gallagher, Tsuhan Chen

Data Mining: Clustering

Exploring Context with Deep Structured models for Semantic Segmentation

Bayesian localization microscopy reveals nanoscale podosome dynamics

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem

Image Segmentation using K-means clustering and Thresholding

Deep Learning-driven Depth from Defocus via Active Multispectral Quasi-random Projections with Complex Subpatterns

Synthesis Distortion Estimation in 3D Video Using Frequency and Spatial Analysis

Research Article Inviscid Uniform Shear Flow past a Smooth Concave Body

Refinement of scene depth from stereo camera ego-motion parameters

Blind Image Deblurring Using Dark Channel Prior

A Convex Clustering-based Regularizer for Image Segmentation

Fast Window Based Stereo Matching for 3D Scene Reconstruction

Computer Graphics Chapter 7 Three-Dimensional Viewing Viewing

A Versatile Model-Based Visibility Measure for Geometric Primitives

Tight Wavelet Frame Decomposition and Its Application in Image Processing

Dense Disparity Estimation in Ego-motion Reduced Search Space

Generalized Low Rank Approximations of Matrices

A Comparative Evaluation of Iris and Ocular Recognition Methods on Challenging Ocular Images

Integration of Multiple-baseline Color Stereo Vision with Focus and Defocus Analysis for 3D Shape Measurement

Calculation on diffraction aperture of cube corner retroreflector

Learning Data Terms for Non-blind Deblurring Supplemental Material

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control

A multiple wavelength unwrapping algorithm for digital fringe profilometry based on spatial shift estimation

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks

Removing Atmospheric Turbulence

Learning Polynomial Functions. by Feature Construction

A Duality Based Approach for Realtime TV-L 1 Optical Flow

CONSTRUCTION AND ANALYSIS OF INVERSIONS IN S 2 AND H 2. Arunima Ray. Final Paper, MATH 399. Spring 2008 ABSTRACT

Modifying ROC Curves to Incorporate Predicted Probabilities

EFFICIENT STEREO MATCHING BASED ON A NEW CONFIDENCE METRIC. Won-Hee Lee, Yumi Kim, and Jong Beom Ra

Deep Spatial Pyramid for Person Re-identification

Deblurring Text Images via L 0 -Regularized Intensity and Gradient Prior

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models

Robust Depth-from-Defocus for Autofocusing in the Presence of Image Shifts

Filter Flow: Supplemental Material

Light Field Occlusion Removal

Digital fringe profilometry based on triangular fringe patterns and spatial shift estimation

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH

Short-term prediction of photovoltaic power based on GWPA - BP neural network model

Framework for Processing Videos in the Presence of Spatially Varying Motion Blur

Robust Camera Calibration for an Autonomous Underwater Vehicle

A fast embedded selection approach for color texture classification using degraded LBP

Divide-and-Conquer Algorithms

filtering LETTER An Improved Neighbor Selection Algorithm in Collaborative Taek-Hun KIM a), Student Member and Sung-Bong YANG b), Nonmember

A Factorization Method for Structure from Planar Motion

EFFICIENT ON-LINE TESTING METHOD FOR A FLOATING-POINT ADDER

Supplementary Material : Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision

Section 20. Thin Prisms

Fast Fractal Image Compression using PSO Based Optimization Techniques

THE BAYESIAN RECEIVER OPERATING CHARACTERISTIC CURVE AN EFFECTIVE APPROACH TO EVALUATE THE IDS PERFORMANCE

Unknown Radial Distortion Centers in Multiple View Geometry Problems

Depth Sizing of Surface Breaking Flaw on Its Open Side by Short Path of Diffraction Technique

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Coupling the User Interfaces of a Multiuser Program

Single Image Deblurring Using Motion Density Functions

Characterizing Decoding Robustness under Parametric Channel Uncertainty

Non-homogeneous Generalization in Privacy Preserving Data Publishing

DEPTH ESTIMATION FROM A VIDEO SEQUENCE WITH MOVING AND DEFORMABLE OBJECTS

Joint Depth Estimation and Camera Shake Removal from Single Blurry Image

arxiv: v2 [cs.lg] 22 Jan 2019

Hand-Eye Calibration from Image Derivatives

A half-scan error reduction based algorithm for cone-beam CT

Learning convex bodies is hard

AN INVESTIGATION OF FOCUSING AND ANGULAR TECHNIQUES FOR VOLUMETRIC IMAGES BY USING THE 2D CIRCULAR ULTRASONIC PHASED ARRAY

Estimation of large-amplitude motion and disparity fields: Application to intermediate view reconstruction

1/5/2014. Bedrich Benes Purdue University Dec 12 th 2013 INRIA Imagine. Modeling is an open problem in CG

Section 19. Thin Prisms

Skyline Community Search in Multi-valued Networks

ADVANCED IMAGE PROCESSING METHODS FOR ULTRASONIC NDE RESEARCH C. H. Chen, University of Massachusetts Dartmouth, N.

A Cost Model For Nearest Neighbor Search. High-Dimensional Data Space

An Anti-Aliasing Technique for Splatting

1/5/2014. Bedrich Benes Purdue University Dec 6 th 2013 Prague. Modeling is an open problem in CG

EXTRACTION OF MAIN AND SECONDARY ROADS IN VHR IMAGES USING A HIGHER-ORDER PHASE FIELD MODEL

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

SEMI-BLIND IMAGE RESTORATION USING A LOCAL NEURAL APPROACH

Estimating Velocity Fields on a Freeway from Low Resolution Video

A Plane Tracker for AEC-automation Applications

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien

Optimization Beyond the Convolution: Generalizing Spatial Relations with End-to-End Metric Learning

The Reconstruction of Graphs. Dhananjay P. Mehendale Sir Parashurambhau College, Tilak Road, Pune , India. Abstract

Parts Assembly by Throwing Manipulation with a One-Joint Arm

Autocalibration and Uncalibrated Reconstruction of Shape from Defocus

Distributed Decomposition Over Hyperspherical Domains

Module13:Interference-I Lecture 13: Interference-I

Dual Arm Robot Research Report

A Discrete Search Method for Multi-modal Non-Rigid Image Registration

Transcription:

Discriminative Filters for Depth from Defocus Fahim Mannan an Michael S. Langer School of Computer Science, McGill University Montreal, Quebec HA 0E9, Canaa. {fmannan, langer}@cim.mcgill.ca Abstract Depth from efocus (DFD) requires estimating the epth epenent efocus blur at every pixel. Several approaches for accomplishing this have been propose over the years. For a pair of images this is one by moeling the efocus relationship between the two ifferently efocuse images an for single efocuse images by relying on the the properties of the point sprea function an the characteristics of the latent sharp image. We propose epth iscriminative filters for DFD that can represent many of the wiely use moels such as the relative blur, Blur Equalization Technique, econvolution base epth estimation, an subspace projection methos. We show that by optimizing the parameters of this general moel we can obtain state-of-the-art result on synthetic an real efocuse images with single or multiple efocuse images with ifferent apertures.. Introuction Finite aperture cameras can only focus at a single istance or a focal plane at a time. Scene points outsie that focal plane are efocuse. When the efocus blur size is larger than a pixel, it becomes visible in an image. Blur size can be moele using geometric optics an the thin-lens moel in terms of the focal length, focus istance, aperture size, an scene epth. In real optics there are other factors such as lens aberrations, iffraction, an shape of the aperture that a to the characteristics of the blur pattern (Fig. ). In epth from efocus, we are primarily concerne with the epth epenent variation in the blur or the point-sprea function (PSF). If we can measure the blur from a given efocuse image or an image pair with known camera parameters, then we can infer the epth of the corresponing scene points. Many ifferent methos have been propose over the years to solve this problem. The common principle in most of these methos is to apply a epth epenent transformation an evaluate the quality or the reconstruction error ue to the transformation. Transformations that prouce small (a) Real PSF (7 7) [] (b) Coe Aperture ( ) [] Figure : Examples of PSFs from our ataset. (ieally zero) error are inicative of epth. We refer to these ifferent transformations as ifferent moels. There are two main contributions of this work. First, we present a epth iscriminative filter base moel an show that a large set of existing moels are special cases of our moel. Secon, we propose to learn the parameters of this moel uner a uniqueness constraint. We experimentally show that our moel can work with fewer parameters than comparable learning base approaches an is more flexible an accurate. The paper is organize as follows: Sec reviews some of the works relate to moeling epth from efocus. Sec. shows that many of the existing epth from efocus moels can be represente using a unifie moel. Sec. proposes fining an optimal moel using a ata-riven approach. Depth iscriminative filters obtaine from our metho are evaluate in Sec. with synthetic an real images.. Relate Work The earliest an most wiely use moel for DFD is the relative blur moel. For a pair of efocuse images obtaine using two ifferent camera parameters, relative blur is the amount by which the sharper image is blurre to obtain the blurrier image. The moel is motivate by the observation

that epth epens on the ratio of the Fourier transform of the two efocuse images. This moel is the basis of several works that estimate epth in the frequency omain [, ]. Later Ens an Lawrence [] showe that relative blur estimation in the spatial omain performs better for noisy inputs. Most of these methos assume that the PSF is either a Gaussian or a pillbox. Watanabe an Nayar in [7] propose moeling the normalize ratio or the ratio between the ifference an sum of the images rather than the unnormalize versions of relative blur. Subbarao et al. [] propose a spatial omain approach for moeling the relative blur by consiering a polynomial moel for the image. Later Xian an Subbarao [9] extene the iea an use the commutative property of convolution with the Blur Equalization Technique (BET). In aition to these there is also the non-blin econvolution base approach use for coe apertures [, ]. It shoul be note that many of the later works that use relative blur an econvolution inclue some type of smoothness prior for improving epth estimation accuracy e.g. [,,,, 7, 0, ]. Moels such as relative blur, BET an econvolution are primarily motivate by the blur formation process an aim to explain the efocus blur size by fining an equalizing transformation. On the other han subspace projection methos such as [, ] try to learn the transformation from the ata. They moel the problem as projection of the efocuse images to a epth epenent subspace. Favaro an Soatto [] consier the null space of efocuse images while Martinello an Favaro [] consier the rank space of efocuse image. However these methos o not try to ensure that the subspaces for ifferent epths are inepenent from one another. Motivate by this, Wu et al. [] propose fining the inepenent orthogonal subspaces. Their work is closest to ours. All these methos require a large number of basis for the orthogonal subspaces. In contrast to these approaches, we only consier subspaces that are epth iscriminative. Filter base approaches such as rational filters [7, 9] or active DFD [] are also similar to ours, in that their goal is to fin optimal epth iscriminative filters. However their filters are esigne manually base on assumptions about the PSF or on the projecte texture in the case of active DFD. Furthermore, they may have restriction on camera parameters (e.g. only variable focus) or range of epth (e.g. only between the two focal planes). Compare to these methos we learn the optimal filters from efocus ata an o not make assumptions about the PSF. Other learning base methos ealing with image blur, e.g. [,,, 0], o not estimate epth an are about eblurring an entire image. Some of these methos use convolutional neural networks (CNN) [7, ] where the input image is convolve with a set of filters. However, those filters are ifferent from ours in that they are use to reuce the number of parameters an fin shift-invariant features that are useful for later stages. Our filters are epth iscriminative an only classify patches (i.e. input is the size as the filters) rather than an entire image.. Depth Discriminative Filter Moel Convolution as Matrix-Vector Prouct From the linear space-invariant moel of blurre image formation, if the blur kernel corresponing to a scene point at epth is h, an the latent sharp image is i 0, then the observe blurre image is: i = h i 0. () In matrix notation the above convolution can be written as, x = H x 0 where H is the convolution matrix corresponing to the PSF h, an x 0, x are vectorize versions of latent an observe images i 0 an i respectively. The vectorize image is forme by stacking all the image columns into a single column vector. The convolution matrix H can have ifferent structures base on the bounary conition. But in general, each row is a vectorize version of the spatial PSF (h ) with the elements of the PSF organize in such a way that the ot prouct with the image results in the final pixel value. More etails on these convolution matrices can be foun in [9]. Patch-centric View of Convolution In this work, we consier image patches rather than an entire image. This results in the convolution being represente as a ot prouct between the PSF an an image patch, i.e., x (p) = h T x p 0. Here x p 0 is an image patch centere at pixel p (e.g. the front efocuse image at the top of Fig. ), an x (p) is the value of the pixel in the observe image (i.e. the otte pixel in the figure). To correctly represent convolution, h correspons to a 0 rotate (i.e. flippe up-own an left-right) version of the PSF. Depth Discriminative Filter Representation We now generalize this iea of ot prouct between a patch an a kernel to the problem of epth estimation. From here on we will use x to enote a vectorize patch or a concatenation of patches centere aroun some pixel (as shown on the top-right of the figure). We consier the problem of epth estimation to be applying a set of epth iscriminative filters to x, an choosing epth base on the filter response. For instance, to estimate epth at a pixel of a single image, we take a patch centere at that pixel an compute the

L K K Defocuse Image K L } K K }K K K L matrix to K L vector conversion Per-Image K K L W W K L } M K L x = W x M W x E = W x M patches epth epth N Depth Discriminative Filters W N Matrix Representation W N x W N x Figure : Depth from efocus can be consiere as applying a set of epth iscriminative filters (bottom-left) to a set of efocuse image patches (top). These filters can either be analytically erive, or estimate from calibrate PSFs, or learne from efocuse images. The filters can be represente using a set of matrices W = {W,..., W N } of size R M KL, where M is the number of filters of size K K for L patches with epth [, N]. The estimate epth is, if for any other epth, W x < W x. magnitue of the response of a set of filters at every epth. The epth of the pixel is chosen to be the epth of the filter bank that gives the lowest value. For multiple ifferently efocuse images of the same scene at a given epth (i.e. L > ), we take the inner prouct of three-imensional filters of size K K L with the patch-volume centere at the pixel. For epth an vectorize patch-volume x, this operation can be represente as W x, where each row of W correspons to L filters of size K K as shown in Fig.. In general, we consier M filter volumes, or a mapping from K L-imensional efocuse patch space to M-imension space. Therefore, for a set of efocuse image patches x R KL an a set of epth iscriminative filters W R M KL, the following problem is solve for each pixel: argmin W x. () DFD an Orthogonality In the rest of this section, we represent some of the wiely use DFD moels using the above general formulation. These moels are built on the assumption that in the absence of noise an moeling errors, for every epth there exists W, such that W x = 0, where x is any vectorize patch at epth... Relative Blur The relative blur moel assumes that the sharper (x S ) an blurrier images (x B ) are relate by a epth epenent relative blur, i.e., in terms of the convolution matrix notation, x B = H R x S + N (0, σ N). () Therefore, the relative blur estimation problem can be written as the following optimization problem: argmin [ I, HR () ] [ x ] B. () x S In the patch-level representation w T = [e T c, h T R ()], where c is the center-pixel s inex. In terms of Fig., at every epth M = an L =. Notation: We use x to enote some patch at an unknown epth, subscript of x e.g. x to enote any patch at epth, x t to enote t-th patch in a training set, an x, x, x S, x B to enote st, n, sharper an blurrier images or patches epening on the context.

In the absence of noise an Gaussian PSFs, we have, w T x = 0. When the PSFs are not Gaussian we can formulate the relative blur estimation problem as fining the closest orthogonal vector to the observe efocuse images. That is, argmin [ e T c, h T ] [ ] x B R () h R x S subject to h R =, h R 0... Blur Equalization Technique Blur equalization relies on the commutative property of convolution which in the matrix form can be expresse as: H x = H x + N (0, σ N ). () This results in the following epth estimation problem, argmin [ H () H () ] [ x ]. (7) x In terms of the patch-level representation w T = [h T (), h T ()] an M =, L =. In the absence of noise an PSF moeling error, w is orthogonal to the observe efocuse images at epth, i.e., w T x = 0... Deconvolution In the econvolution base epth estimation approaches, the first step is to obtain a econvolve image patch for ifferent epth hypotheses. Then quality of the reconstruction is evaluate. The epth hypothesis that results in the highest quality sharp patch (e.g. sharp an no ringing artifacts) is consiere to be the epth label. For known PSFs H for epth, the econvolution base formulation for DFD is: argmin H x 0 x + λ Cx 0. () x 0, Here x 0 is the latent sharp patch an H is a convolution matrix or a concatenation of matrices for a given epth. Similarly x can be a single patch or L patches. In the last term, C is a matrix representing a prior for natural images. The goal is to jointly fin the latent patch an the epth estimate that explains the observe blurre patches. A typical way to solve this problem is to first fin a econvolve patch for a given blur PSF H for the epth hypothesis. The solution to this problem is: ˆx 0 () = H + x (9) where H + is an inversion matrix for epth. Solving Eq. for x 0 gives the inversion matrix to be H + = (HT H + λc T C) H T. (0) But there are also other ways of obtaining an inversion matrix, such as the Truncate Singular Value Decomposition (TSVD), where the prior on patches imply iscaring the singular vectors corresponing to singular values below some threshol. In this case, the inversion matrix is: H + = V k Σ k U T k () where H = UΣV T, an U k, Σ k, an V k correspon to the first k singular vectors. Hence H + is the truncate inverse of the convolution matrix for epth. The truncation threshol can be etermine from spectral analysis of the PSF [9]. The above fins the latent sharp patch. For epth estimation we fin which epth epenent PSF prouces the correct sharp patch. This can be foun from the reconstruction error of the observe patch, i.e., argmin (I H H + )x. () Therefore, we have W = I H H +. It can be shown that W is symmetric (i.e. W T = W ) an iempotent (i.e. W T W = W ) an therefore W is an orthogonal projection operator. Furthermore, in the absence of noise an moeling error, W x = 0. In terms of our iscriminative filter representation, M = K L, where the number of observe patches L = or... Subspace Projection Base on the orthogonal projection matrix of W in Eq., [] proposes to fin orthogonal projectors from known PSFs or learn them from efocuse images. An orthogonal projector is a matrix H with the property that H = UU T, where U is an orthonormal matrix. For known PSFs, these orthogonal projectors are erive from the PSF, but for unknown PSFs they are learne from a training set of efocuse images using the singular value ecomposition. If columns of U N correspon to the smallest singular values then H = U N U T N. Depth estimation can be expresse as matrix vector multiplication between the left null space vectors an the input images, i.e. W = U N, argmin UN T ()x. () Martinello an Favaro propose an equivalent formulation in [] where the images are projecte onto the rank space U R, i.e., W = U R, or argmin UR T ()x. () In both cases, we nee to choose the number of vectors (i.e. M in Fig. ) that form the basis of the subspace. Usually

they are in the orer of hunres. As the number of vectors an the number of epth level grows the computational cost grows significantly. Another limitation is that the subspace projection methos compute the subspaces of each epth inepenently from other epths. This oes not enforce the subspaces to be maximally inepenent from one another. Base on this observation, Wu et al. [] propose an extension to subspace projection where they learn subspaces that are iscriminative. For any two epths i an j, they enforce W i x i < W j x i j i. For a patch of size K K, the iscriminative metric W is a matrix of size K K. Therefore, for L efocuse images, M = K L. To summarize, we have shown that many of DFD moels can be expresse into a common representation of W x. More specifically, for ifferent moels, we show how to construct W. We have also shown uner what conitions they satisfy the orthogonality assumption, i.e. W x = 0.. Learning Depth Discriminative Filters In the last section, we expresse the problem of epth from efocus as solving Eq.. The argmin operator has a unique solution when W x < W x () Different moels result in ifferent forms of W. The unerlying assumption in all of them is that when W is orthogonal to efocuse images x at epth, i.e., W x = 0, then Eq. is satisfie. In this section, we first iscuss the limitations of the existing moels an their assumptions, then propose our approach an iscuss its avantages an isavantages. Limitations of Existing Moels The existing moels, except for the learning base approaches, are base on certain assumptions about the image formation process. They perform well when those assumptions hol. In general, these accuracy of the moeling assumptions can change with camera parameters an epth. For instance, relative blur may perform better when the aperture is smaller because in that case the PSFs can be moele by a Gaussian. For larger aperture (i.e. less iffraction) with complex shape, BET or econvolution woul perform better than relative blur. Also as the size of the PSF changes with epth, ifferent moels may perform ifferently. The subspace projection base methos learn from ata an o not have the above limitations. However, they may not necessarily satisfy Eq.. While the iscriminative metric learning approach enforces Eq., it requires learning a large matrix an also requires large number of convolution operations uring evaluation. Our Propose Approach We propose to irectly learn the filters that satisfy Eq. from a set of fronto-parallel efocuse images collecte at ifferent epths. This leas to the following constraine optimization problem. argmin ρ(w ) () W subject to W yt x t < W j x t t, j j y t Here x t is the t-th training image patches an y t is the associate epth. ρ(w ) is a regularization function on the filters which in our case is the square Frobenius norm. The above optimization problem can be solve in many ways. In this work, we transform it into the following unconstraine optimization problem using the Hinge-Loss [, ]: argmin W λ ρ(w ) + λ N ρ l (y t, j) max(0, E yt E j + m) (7) t,j Here, E yt = W yt x t an E j = W j x t an m is the margin between them. The regularization function ρ l is an aitional weight as a function of true label of the t-th example y t an any label j. This ensures that wrong labels that are farther away from the correct label are penalize more than those that are closer to the correct label. As a result this weighting function maximizes the curvature of the cost function at the true epth an in turn minimizes the variance of the estimate. In our experiments we use E = log( W x ), margin m =, an ρ l (y t, j) = (a y t j ) k, with k [0, ] an a (0, ]. Avantages Our approach allows for the filters to be orthogonal to efocuse images. For instance, the iniviual rows can be relative blur or BET as long as the constraint in Eq. is satisfie. In fact our solution allows for both moels to co-exist even though they are not orthogonal to each other which is a requirement for [,, ]. Our approach is ifferent from Wu et al. [] in that we o not require W to be a projection operator an as a result we solve a smaller optimization problem. In our approach, we choose the kernel size K an the number of filters M per epth, where M K L. This way we are forcing the optimizer to fin the best possible epth iscriminative M filters per epth. Disavantages We share some of the isavantages of other learning base approaches. For instance, we nee to the learn the filters for ifferent camera configurations. Similar to [], our approach requires jointly learning all the filters. This is a computationally expensive process. But unlike Wu et al, we optimize fewer variables, i.e., MK LN instea of K L N.

Existence of Solution For certain camera configurations an aperture shapes, Eq. may not be satisfiable. For instance, if the aperture is circularly symmetric an two images are taken with two ifferent apertures with focus at the center of the scene, then scene points equiistant from the focal plane an on opposite sies will prouce the same PSFs. In such cases, epth from efocus cannot be performe. 0 0 (00) 0 RMSE. Rank Space (00) Rank Space (00).... 7 9 0 Label. Experiments (a) Mean an Variance (b) Root-Mean-Square Error In this section we look at results for both synthetic an real efocus blur experiments. We show that the optimization process reuces the mean an variance of the epth estimates. Our experiments use both single images an a pair of images. Estimate Label 7 9 0 Estimate Label 7 9 0 General Setup In all the experiments training an test images are forme from ifferent sets of images. The training set is forme by taking the training images at ifferent epths, partitioning them into blocks of size equal to the kernel size, an only consiering patches that have variance above some threshol. We choose the number of filters, their size, an initialize them either ranomly or using the left null space vectors corresponing to the lowest singular values. Starting with the left null space vectors usually results in faster convergence. Eq. 7 is optimize using batch graient escent with the Aam upate rule [0]. Estimate Label 7 9 0 7 9 0 (c) Initial Average Cost 0 (e) Average Cost Estimate Label 7 9 0 0 () Optimize Average Cost 0 (f) (00) Average Cost Image Dataset Synthetic efocuse images are generate by taking a set of sharp textures an then blurring them with ifferently scale PSFs. For real images, we collecte a set of efocuse images at 7 epths uniformly space in inverse epth between 0. m an. m, with 7 ifferent aperture an 9 ifferent focus istances. We present results for a subset of these configurations. For comparison with other methos such as the relative blur, BET an econvolution, we first calibrate the PSFs for all the configurations using []. For the relative blur metho, the relative blur kernels for the relevant configurations are also estimate using [] which is a version of Eq.. In all the real image base experiments, the filters are traine on the /f (Fig. a) an Flower images (b) an teste on the Bark images (Fig. c)... Depth from a Single Defocuse Image Synthetically Defocuse with a Coe Aperture Fig. shows the results for estimating epth from a single coe aperture image. Both the training an test images are synthetically generate using PSF kernels of size ranging from (Fig. b) to a single pixel. For each blur level, 0 Filters of size were learne from ranom Figure : Experiments using synthetically efocuse images with a single coe aperture from Zhou et al. []. All plots correspon to evaluation on the test ata. We use equally space blurs with blur raii (in pixels) ranging from (inex ) to 0 (inex ). For each blur level, 0 filters of size were learne. Comparisons are shown for equivalent number (i.e. 0) an a larger number (i.e. 00) of null space (Eq. ) an rank space (Eq. ) filters. Rank space filter results are only shown in (b). initialization. Fig. c shows the average cost i.e. average of W x over many ifferent x for the initial filters an Fig. for the final optimize filters. Figs. e an f show results for the null space filters with ifferent rank. It can be seen that our 0 filters per epth achieve equivalent accuracy an in certain cases are more accurate than using a large number (i.e. 00) of filters. For instance, when the PSF is a elta function then orthogonal projection base methos can have large error in the estimates, i.e. the rightmost column of Figs. e an f have off-iagonal minima. Real Defocuse Images with a Conventional Aperture Fig. shows the mean an stanar eviation of inverse

(a) /f noise (b) Broatz - Flower (c) Broatz - Bark Figure : Textures renere on the isplay as capture by the camera. For all these images the object to sensor istance is. m, focus istance. m, an f / with 0. s exposure. The capture images are of size pixels but we only use the center part for our experiments... 0. 0..... true inverse epth Figure : Mean an stanar eviation of inverse epth estimates from a single real efocuse image of a tree bark texture (Fig. c) taken with a conventional aperture with f / an focus at 0. m (.D or iopters). epth for a single efocuse image with f / an focus at 0. m. For each epth 0 filters of size are use. As the blur size gets larger, it gets more ifficult to iscriminate epth. However compare to the null space base filters (i.e. rank 0), our optimize filters perform significantly better. For our optimize filters, the average epth only eviates from the correct epth when the blur gets very large an for smaller blur sizes the variance is smaller... Depth from Defocuse Pair tures. The strength of our approach is illustrate by Figs. a an b, which show the average cost from the null space filters an our optimize filters respectively. Each column correspons to a epth, an each row, the response of a filter bank for the corresponing epth. Each cell is the average response of a few thousan patches at a given epth. Therefore, each column is the response of ifferent filters for a given epth or the cost function for a given input. The columns are normalize to highlight the shape of the cost function. Ieally, we want to have a strong iagonal. From the plots, we can see that our approach prouces a stronger iagonal than the same number of null space filters. From the null space filters we can see that near the scene bounary the cost function gets flatter. Therefore we woul expect high variance in the epth estimates in those regions. This is visible in the epthmaps estimate from real efocuse images obtaine in [7] (see Fig. 7). The top ege of the cup is noisier than our optimize filters. We use filters of size which is significantly smaller than the 7 7 filters use by [7]. 0 0 Estimate Label estimate inverse epth. Estimate Label. 0 0 0 0 0 0 Traine using Synthetically Defocuse Images Fig. shows result for Gaussian PSFs. The training set is synthetically generate assuming a scene within.9 cm to.9 cm an camera with focal length mm an f /.. The largest blur raius is. pixels. The blur scale is ivie into epth levels an natural image textures are synthetically blurre to generate the training an test sets. In all cases, the training an test sets contain ifferent image tex- 0 0 0 0 0 0 (a) Average Cost 0 0 0 0 0 0 (b) Our Average Cost Figure : Average cost for Gaussian efocus pair. Each column is an average of a few thousan image patches. Blue inicates 0 an bright yellow. Our metho forces the minimum to be on the iagonal.

....... Relative Blur BET Deconvolution..... estimate inverse epth estimate inverse epth...... true inverse epth (a) Near.7...... Relative Blur BET Deconvolution... true inverse epth Figure 7: Top: The Breakfast ataset from Watanabe et al. [7]. Bottom: Results using M = an K = Null space filters an our epth iscriminative filters post-processe using a meian filter. have lower noise for the cup. Traine using Real Defocuse Images Two ifferent camera settings are use for the real image-pair base experiments shown in Fig.. The first setting is variable focus where the aperture is fixe to f / an the two images are taken at focus istances m an. m. The secon setting is variable aperture where the focus is fixe at 0. m an two images are taken with apertures f / an f /. In both cases, we compare our propose metho to relative blur, BET, econvolution (left column of the figure) an filters from the left null space (right column). The variable focus case (top row of Fig. ) uses 0 filters per epth image-pair of size. The variable aperture case (bottom row of Fig. ) uses 0 filters of size. In all cases, our metho works as well as or better than the other methos. In the experiments, most of the methos worke better for the variable focus case because the maximum blur size is smaller for variable focus than for variable aperture.. Conclusion We have shown that DFD can be represente using a general epth iscriminative filter base moel. Our moel specifies a matrix W for every epth. To emonstrate its generalization ability we expresse a large subset of existing moels using our representation. These ifferent moels have ifferent forms for W. Using a single representation also allows us to see the similarities an ifferences... estimate inverse epth estimate inverse epth. () Optimize Filters...... Variable Focus with f / an focus m an. m (b) Far. (c) true inverse epth.......... true inverse epth Variable Aperture with f / an f /, an focus 0. m. Figure : Mean an stanar eviation of inverse epth estimates from real image base experiments. Top row correspons to variable focus setting an bottom row to variable aperture setting. The left column compares our estimate filters with relative blur, BET, an econvolution, an the right column with equivalent null space filters (Eq. ). between these moels. In particular, we show that most of the moels prefer kw x k = 0 uner ieal conitions e.g. noise-free an no moeling error. We argue that the main requirement for fining unique epth estimates is for the W s to satisfy Eq. but the conition kw x k = 0 is not sufficient for ensuring Eq. hols. This le us to propose a learning base approach where we learn W s that satisfy Eq.. We experimentally compare our learning base approach with other moels using both synthetic an real images with ifferent types of apertures an number of images (i.e. single or pair). For single image base experiments we foun our approach to work better than subspace projection base methos with equivalent number of filters. For a pair of images, we have shown that our approach works as well as the other moels an in some cases even better. By looking at the average of kw xk over a large number of images (i.e. Fig. an ), we can see that our approach on average gives the correct solution (i.e. on the iagonal) with low variance (concentrate near the iagonal). To conclue, we propose a novel generalize formulation of the DFD problem an showe that we can learn the parameters of our moel by enforcing a uniqueness constraint. We have experimentally verifie that our propose approach works well on a range of ataset.

References [] K. Crammer an Y. Singer. On the algorithmic implementation of multiclass kernel-base vector machines. J. Mach. Learn. Res., : 9, Mar. 00. [] J. Ens an P. Lawrence. An investigation of methos for etermining epth from focus. PAMI, ():97 0, 99. [] P. Favaro. Recovering thin structures via nonlocal-means regularization with application to epth from efocus. In CVPR, pages 0, June 00. [] P. Favaro an S. Soatto. A geometric approach to shape from efocus. PAMI, 7():0 7, march 00.,, [] P. Favaro an S. Soatto. -D Shape Estimation an Image Restoration - Exploiting Defocus an Motion Blur. Springer, 007. [] P. Favaro, S. Soatto, M. Burger, an S. Osher. Shape from efocus via iffusion. PAMI, 0():, March 00. [7] K. Fukushima. Neocognitron: A self-organizing neural network moel for a mechanism of pattern recognition unaffecte by shift in position. Biological Cybernetics, ():9 0, 90. [] O. Ghita, P. F. Whelan, an J. Mallon. Computational approach for epth from efocus. Journal of Electronic Imaging, ():00 00, 00. [9] P. Hansen, J. Nagy, an D. O Leary. Deblurring Images. Society for Inustrial an Applie Mathematics, 00., [0] D. P. Kingma an J. Ba. Aam: A metho for stochastic optimization. ICLR, 0. [] Y. LeCun, L. Bottou, Y. Bengio, an P. Haffner. Graientbase learning applie to ocument recognition. Proceeings of the IEEE, ():7, Nov 99. [] Y. LeCun, S. Chopra, R. Hasell, M. Ranzato, an F. Huang. A tutorial on energy-base learning. Preicting structure ata, :0, 00. [] A. Levin, R. Fergus, F. Duran, an W. T. Freeman. Image an epth from a conventional camera with a coe aperture. ACM Trans. Graph., ():70, 007. [] F. Li, J. Sun, J. Wang, an J. Yu. Dual-focus stereo imaging. Journal of Electronic Imaging, 9:0009, 00. [] F. Mannan an M. S. Langer. Blur calibration for epth from efocus. In CRV, 0., [] M. Martinello an P. Favaro. Vieo Processing an Computational Vieo: International Seminar, Dagstuhl Castle, Germany, October 0-, 00. Revise Papers, chapter Single Image Blin Deconvolution with Higher-Orer Texture Statistics, pages. Springer Berlin Heielberg, Berlin, Heielberg, 0.,, [7] V. Nambooiri, S. Chauhuri, an S. Haap. Regularize epth from efocus. In ICIP, pages 0, oct. 00. [] A. P. Pentlan. A new sense for epth of fiel. PAMI, 9:, July 97. [9] A. N. J. Raj an R. C. Staunton. Rational filter esign for epth from efocus. Pattern Recognition, ():9 07, 0. [0] A. Rajagopalan an S. Chauhuri. An MRF moel-base approach to simultaneous recovery of epth an restoration from efocuse images. PAMI, (7):77 9, jul 999. [] U. Schmit, C. Rother, S. Nowozin, J. Jancsary, an S. Roth. Discriminative non-blin eblurring. In CVPR, pages 0, June 0. [] C. J. Schuler, M. Hirsch, S. Harmeling, an B. Schlkopf. Learning to eblur. TPAMI, (7):9, July 0. [] S. M. Seitz an S. Baker. Filter flow. In ICCV, pages 0, 9 009-oct. 009. [] M. Subbarao. Parallel epth recovery by changing camera parameters. In ICCV, pages 9, ec 9. [] M. Subbarao an G. Surya. Depth from efocus: A spatial omain approach. IJCV, ():7 9, 99. [] J. Sun, W. Cao, Z. Xu, an J. Ponce. Learning a convolutional neural network for non-uniform motion blur removal. In CVPR, pages 79 777, June 0. [7] M. Watanabe an S. K. Nayar. Rational filters for passive epth from efocus. IJCV, 7:0, May 99., 7, [] Q. Wu, K. Wang, W. Zuo, an Y. Chen. Neural Information Processing: th International Conference, ICONIP 0, Shanghai, China, November -7, 0, Proceeings, Part III, chapter Depth from Defocus via Discriminative Metric Learning, pages 7. Springer Berlin Heielberg, Berlin, Heielberg, 0., [9] T. Xian an M. Subbarao. Depth-from-efocus: Blur equalization technique. SPIE,, 00. [0] L. Xu, J. S. Ren, C. Liu, an J. Jia. Deep convolutional neural network for image econvolution. In NIPS, pages 790 79, 0. [] C. Zhou, S. Lin, an S. Nayar. Coe Aperture Pairs for Depth from Defocus an Defocus Deblurring. IJCV, 9():, May 0.,,