Automatic Reflection Removal using Gradient Intensity and Motion Cues

Similar documents
Single Image Reflection Suppression

Learning to See through Reflections

Extracting Smooth and Transparent Layers from a Single Image

Separating Transparent Layers through Layer Information Exchange

Robust Model-Free Tracking of Non-Rigid Shape. Abstract

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Image Resizing Based on Gradient Vector Flow Analysis

CS4670: Computer Vision

Detecting motion by means of 2D and 3D information

Benchmarking Single-Image Reflection Removal Algorithms

EE795: Computer Vision and Intelligent Systems

arxiv: v2 [cs.cv] 10 May 2017

Motion Estimation. There are three main types (or applications) of motion estimation:

Segmentation and Tracking of Partial Planar Templates

Video Alignment. Literature Survey. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Separating Transparent Layers of Repetitive Dynamic Behaviors

EE795: Computer Vision and Intelligent Systems

Light Field Occlusion Removal

SUPPLEMENTARY MATERIAL

Color Me Right Seamless Image Compositing

Multi-stable Perception. Necker Cube

Local features and image matching. Prof. Xin Yang HUST

DUAL DEBLURRING LEVERAGED BY IMAGE MATCHING

Computational Photography and Video: Intrinsic Images. Prof. Marc Pollefeys Dr. Gabriel Brostow

Peripheral drift illusion

Feature Detectors and Descriptors: Corners, Lines, etc.

Blind Image Deblurring Using Dark Channel Prior

Midterm Examination CS 534: Computational Photography

Automatic Shadow Removal by Illuminance in HSV Color Space

User assisted separation of reflections from a single image using a sparsity prior

DECOMPOSING and editing the illumination of a photograph

IMAGE DENOISING TO ESTIMATE THE GRADIENT HISTOGRAM PRESERVATION USING VARIOUS ALGORITHMS

Video Alignment. Final Report. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

KERNEL-FREE VIDEO DEBLURRING VIA SYNTHESIS. Feitong Tan, Shuaicheng Liu, Liaoyuan Zeng, Bing Zeng

Blind Deblurring using Internal Patch Recurrence. Tomer Michaeli & Michal Irani Weizmann Institute

Deblurring Text Images via L 0 -Regularized Intensity and Gradient Prior

Single Image Deblurring Using Motion Density Functions

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

Noise vs Feature: Probabilistic Denoising of Time-of-Flight Range Data

Supervised texture detection in images

Specular Reflection Separation using Dark Channel Prior

MAS.131/MAS.531 Separating Transparent Layers in Images

Fast Image Registration via Joint Gradient Maximization: Application to Multi-Modal Data

CS 4495 Computer Vision Motion and Optic Flow

Local Feature Detectors

Computer Vision for HCI. Topics of This Lecture

Particle Tracking. For Bulk Material Handling Systems Using DEM Models. By: Jordan Pease

STRUCTURAL EDGE LEARNING FOR 3-D RECONSTRUCTION FROM A SINGLE STILL IMAGE. Nan Hu. Stanford University Electrical Engineering

Comparison of Some Motion Detection Methods in cases of Single and Multiple Moving Objects

Automatic Photo Popup

Synthetic Aperture Imaging Using a Randomly Steered Spotlight

Dense Image-based Motion Estimation Algorithms & Optical Flow

Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds

Feature Tracking and Optical Flow

CS 664 Segmentation. Daniel Huttenlocher

International Journal of Computer Engineering and Applications, Volume XII, Issue I, Jan. 18, ISSN

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

Image features. Image Features

Joint Depth Estimation and Camera Shake Removal from Single Blurry Image

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Motion and Tracking. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Digital Image Processing (CS/ECE 545) Lecture 5: Edge Detection (Part 2) & Corner Detection

Video Analysis for Browsing and Printing

Fast trajectory matching using small binary images

Coarse-to-fine image registration

Local invariant features

What have we leaned so far?

Super-Resolution. Many slides from Miki Elad Technion Yosi Rubner RTC and more

CS223b Midterm Exam, Computer Vision. Monday February 25th, Winter 2008, Prof. Jana Kosecka

Iterative Removing Salt and Pepper Noise based on Neighbourhood Information

Robotics Programming Laboratory

Texture Sensitive Image Inpainting after Object Morphing

SEMI-BLIND IMAGE RESTORATION USING A LOCAL NEURAL APPROACH

Panoramic Image Stitching

Image Denoising and Blind Deconvolution by Non-uniform Method

RESTORATION OF DEGRADED DOCUMENTS USING IMAGE BINARIZATION TECHNIQUE

Motion Estimation (II) Ce Liu Microsoft Research New England

Image Features. Work on project 1. All is Vanity, by C. Allan Gilbert,

Image Retargeting for Small Display Devices

Fast Outlier Rejection by Using Parallax-Based Rigidity Constraint for Epipolar Geometry Estimation

EXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006,

CS229: Action Recognition in Tennis

Soft Scissors : An Interactive Tool for Realtime High Quality Matting

A New Technique of Extraction of Edge Detection Using Digital Image Processing

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

3D Computer Vision. Dense 3D Reconstruction II. Prof. Didier Stricker. Christiano Gava

3D Photography: Stereo

A Survey of Light Source Detection Methods

2.1 Optimized Importance Map

Local features: detection and description. Local invariant features

CS 231A Computer Vision (Fall 2012) Problem Set 3

Local features: detection and description May 12 th, 2015

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

Local Image preprocessing (cont d)

CS 223B Computer Vision Problem Set 3

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

An Algorithm for Blurred Thermal image edge enhancement for security by image processing technique

Good Regions to Deblur

CHAPTER 9. Classification Scheme Using Modified Photometric. Stereo and 2D Spectra Comparison

Face Hallucination Based on Eigentransformation Learning

Transcription:

Automatic Reflection Removal using Gradient Intensity and Motion Cues Chao Sun chaorensx@126.com Bing Zeng eezeng@uestc.edu.cn Shuaicheng Liu liushuaicheng@uestc.edu.cn Zhengning Wang zhengning.wang@uestc.edu.cn Taotao Yang taotaoyangt@gmail.com Guanghui Liu guanghuiliu@uestc.edu.cn ABSTRACT We present a method to separate the background image and reflection from two photos that are taken in front of a transparent glass under slightly different viewpoints. In our method, the SIFT-flow between two images is first calculated and a motion hierarchy is constructed from the SIFT-flow at multiple levels of spatial smoothness. To distinguish background edges and reflection edges, we calculate a motion score for each edge pixel by its variance along the motion hierarchy. Alternatively, we make use of the so-called superpixels to group edge pixels into edge segments and calculate the motion scores by averaging over each segment. In the meantime, we also calculate an intensity score for each edge pixel by its gradient magnitude. We combine both motion and intensity scores to get a combination score. A binary labelling (for separation) can be obtained by thresholding the combination scores. The background image is finally reconstructed from the separated gradients. Compared to the existing approaches that require a sequence of images or a video clip for the separation, we only need two images, which largely improves its feasibility. Various challenging examples are tested to validate the effectiveness of our method. 1. INTRODUCTION We study in this paper how to separate reflections when images are taken in front of a transparent glass. In this circumstance, we end up with a mixture of two scenes: the scene beyond the glass and the scene reflected by the glass. Intuitively, one solution is to separate the blended image into the background layer and the reflectance layer, so that they can be observed separately [17]. Some previous methods require a polarizer to capture multiple images with a static camera under different polarized The corresponding author Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. MM 16, October 15-19, 2016, Amsterdam, Netherlands c 2016 ACM. ISBN 978-1-4503-3603-1/16/10... $15.00 DOI: http://dx.doi.org/10.1145/2964284.2967264 angles [5, 6, 13, 16]. These methods employ different models and priors, and can achieve good results if the underlying models/priors are well-fitted. However, a polarizer is often not common in practice. Several methods have attempted to apply a flash [1] and varying focuses [15]. Moreover, Levin et al. [7, 9] proposed to add user interactions to manually label pixels into two layers, in which image gradients are marked to either the background or the foreground and the result is reconstructed by an optimization in the gradient domain. However, manual labelling is tedious. Meanwhile, Levin et al. [9] also proposed an automatic approach by minimizing the total number of edges and corners in the recovered layers according to image s intrinsic statistics. Another class of methods tries to utilize motions, i.e., misalignments of image contents between two layers be used as effective cues for the layer separation. For instance, Szeliski et al. [17] and Irani et al. [14] employed misalignments in the intensity domain, while Gai et al. [3, 4] utilized misalignments in the gradient domain. More recently, Li et al. [10] used SIFT-flow [12] to align several images which automatically labels the registered gradient edges, and ue et al. [18] proposed a computational approach for obstruction separation using video clips. In summary, these methods have made use of motion differences in different layers for optimization. However, they either require several images or a video clip, which may not be practical. In our work, we propose an approach that only requires a minimum number of images, namely, two images, for providing motion cues. Instead of aligning images using a standard SIFT-flow [10], we propose to compute a motion hierarchy from the SIFT-flow with a number of varying spatial smoothness levels. Our observation is that the background edges tend to have uniform motions across the hierarchy while the reflected edges possess larger motion variations. Thus, motion scores are derived for the edge pixels to distinguish background edges from reflection ones. Edge pixels can further be grouped into edge segments according to the so-called image superpixels [2], and motion scores are averaged over each segment for a better smoothness. In the meantime, background edges usually have larger intensities compared to reflection edges, which would yield the edge intensity scores. Then, we leverage the gradient intensities and motion cues for the separation. After labelling edges as either background or reflection, the background can be recovered by an optimization under sparsity priors [7, 9].

Figure 1: Pipeline of our system. (a) and (b) are the source and target input images. (c) An intensity score is derived from gradient edges. (d) SIFT-flow hierarchy is computed with different levels of smoothness, which produces motion scores for edge segments that are grouped by image superpixels (shown in (e)). We combine two scores to separate the edges into (f ) reflectance edges and (g) background edges. (h) The final background image is recovered through the background edges. Optionally, a reflectance image (i) can also be reconstructed. Note that the light spot located at the upper right corner of the image is the background, which is labeled correctly due to the use of the motion score 2. OUR METHOD Figure 1 shows our pipeline. The inputs to our system are two images that are taken in front of a glass with two slightly varying viewpoints. One image is selected as the source image and the other is specified as the target image. Without loss of generality, we only describe the method of removing reflectance of the source image. Notably, the source and target images can be exchanged so that the same method can also be applied to remove relection in the target image. We first generate the edge image (the image gradients) by Sobel operator, as shown in Fig. 1(c). We also generate the motion hierarchy (from the SIFT-flow) as shown in Fig. 1(d). Intensity scores and motion scores are then calculated for the separation. The former is motivated from our observation that the reflected edges are relatively weaker as compared to the background edges. The later is obtained from the SIFTflow through superpixel-based edge segments (Fig. 1(e)). The final background (Fig. 1(h)) is reconstructed from the separated background edges (Fig. 1(f)). Let s denote the source image as I, and its reflectance and background layers as IR and IB such that I = IR + IB. (1) The gradient magnitude of I is denoted as G. Similarly, G is composed of the reflectance component GR and background component GB : G = GR + GB. (2) Here, our goal is to find a binary labelling (0 for reflectence edge or 1 for background edge) for each pixel in G, i.e., to label it as either GR or GB. To achieve this goal, each pixel in G is assigned with two scores, namely, the intensity score and the motion score, as described in the following. Figure 2: (a) Vector angles are calculated in [0, 360o ] to avoid ambiguities. (b) Edge segments in superpixels. 2.1 Intensity score By observations, we notice that most of reflection edges have lower intensities as compared to background edges. While being a simple prior, it provides an effective discrimination. Therefore, we utilize pixel intensities of edges as the intensity score and denote it as Sintensity [0, 255]. We also notice that there exist some violating cases. That is, reflection edges sometimes have larger intensities at some regions than background edges when, for example, light sources are reflected or reflected objects are exposed under strong illumination. Therefore, we need another image so that we can exploit the motions between them for improvements. 2.2 Motion score SIFT-flow [12] has been utilized in [10] to warp multiple input images to a reference image. Here, we adopt the SIFT-flow for the motion calculation between two images. As the background layer is more prominent than the reflectance layer, motions are dominated by the back-

Figure 3: Comparison with the methods in [10] [11]. In each example, the first, second, third and fourth columns show the input, the results of [11], the results of [10], and our results, respectively. ground [10]. In other words, if the source image is warped to the target image by the SIFT-flow, ideally, the backgrounds will be well aligned. However, due to interferences of the reflected edges, the quality of alignment is limited, especially at regions where the reflected edges intersect with the background edges. Thus, rather than warping images by SIFT-flows [10], motions are recorded at the edge positions. The smooth constraint in the SIFT-flow can enforce the neighborhood smoothness such that the noisy motions due to image noises can be well regularized, creating a smoothed flow field [12]. Different levels of noises demand different strengths of smoothness. In our problem, the interferences from reflected edges can be considered as a strong noise. To produce a smoothed flow field, regions with only backgrounds require a weak regularization while regions with reflections desire a strong regularization. To capture the variations, we propose to calculate SIFT-flows with 10 different levels of smoothness, yielding a SIFT-flow hierarchy. Our observation is that a background pixel tends to have similar motion vectors across the SIFT-flow hierarchy while a reflected pixel will have different motion vectors. Let s define a motion vector of pixel i at the k-th hierarchy level as vik. We choose u = (0, 1) as a base vector. We calculate the angle between the motion vector vik and the base vector u: u vik. (3) αik = arccos kukkvik k Note that there are 10 such angles calculated for each pixel. Next, we compute the standard deviation of angles of each pixel. Notably, as the range of αik is [0, 180o ], two vectors equally located at different sides of the base vector would produce 0 variance, see Fig. 2(a) for an example. To fix it, we extend the angle range to [0, 360o ] by adjusting the relative position of motion vectors to the base vector through the vector outer product: ( α ik = αik u vik 0 360o αik u vik < 0 (4) where α ik is the angle with range of [0, 360o ]. Next, the standard deviation σi of α ik (k = 1,, 10) is calculated for each edge pixel. We use σi as our motion score Smotion (i). 2.3 Average of motion scores The calculation of σi is independent at each edge pixel. To enforce the spatial smoothness, we propose to group edge pixels into edge segments. To this end, we compute image superpixels [2], and then edge pixels are grouped into an edge segment if they belong to the same superpixel. Fig. 2(b) shows an example of edge segments that are generated according to superpixels. Then, we would like to encourage the pixels in the same segment to have the same motion score, which can be obtained by taking an average, because the spatial smoothness can be improved in this way. 2.4 Combination score The two scores described above are complementary to each other. For example, reflections with strong intensities will be considered as the backgrounds if only the intensity score is involved. However, large motion variations of reflections can help us to make the right decision. Therefore, a more suitable score is a combination of these two scores. A combination score Scombine (i) at pixel i is obtained as: Scombine (i) = Sintensity (i) εsmotion (i), (5) where ε balances two scores and it is set to 0.5 in our implementation. The minus is used here because edges with large intensity scores are more likely to belong to background but edges with small motion scores are more likely to belong to reflectence. The edge separation is performed by thresholding the combination scores, which gives the binary labelling E(i) as: ( 1 ci,j τ E(i) = (6) 0 ci,j < τ where τ is set to be 80 in our implementation. 2.5 Image reconstruction After the separation of gradient edges, we follow the approach in [7, 8, 10] to reconstruct the background image from the gradients. Levin et al. showed that the long-tailed gradient distribution of natural images is an effective prior for reconstruction in the gradient domain [7, 8]. The Laplacian distribution is thus adopted to approximate the long tailed distribution. For reconstruction, the probability P (IB, IR ) is maximized, which is equal to minimizing log P (IB, IR ). Similar to [7, 8, 10], the energy function is defined as:

Figure 4: More results of our method. Please find some more in the supplementary file. tive screen of a laptop. Finally, a synthetic example is shown in Fig. 4(e). Next, we evaluate our method by turning off each component one by one. 3.1 Figure 5 shows the effect of different scores. Fig 5(b) shows the gradient edges of a green window in Fig 5(a). Here, we sample several points and draw the motions by color lines. As can be seen in Fig 5(b), pixels at the reflected edges (edges of the camera man) have different motions while pixels at the background (the trophy) have uniform motions. Fig 5(c) shows the result by using only the intensity score. It can be seen that the detail of bulbs are lost due to their low intensities. Fig 5(d) shows the result by using only the motion score. It can be seen that some parts of the image are contaminated due to the inaccuracy of the motion estimation. The result obtained by using both scores is shown in Fig 5(e). It is clear that the shadow of the camera man is wiped off, the contaminated region in Fig 5(d) is restored, and the detail of bulbs is preserved. Figure 5: The effect of different scores. Figure 6: The effect of grouping edge pixels. J(IB ) = (fn IB )(x) + ((IL IB ) fn (x)) x,n +λ 3.2 ((IL IB ) fn )(x) (7) x EB,n +λ (IB fn )(x), x ER,n where fn denotes the derivative filter and is the convolution operator. EB and ER denote edges that are labeled to background and reflectance, respectively. Here, we use two orientations and two degrees of derivative filters. The first term enforces the sparsity while the last two terms require the recovered image gradients to follow our separation. This energy can be minimized efficiently using iterative reweighted least squares. 3. Effect of motion and intensity scores EPERIMENTS Figure 3 shows the comparison between our method and the methods in [10] and [11]. Note that 5 images and one image are used in [10] and [11], respectively, to produce the results, whereas our method selects two from 5 images used in [10]. It can be seen that our method has produced better results - notice the strong reflectance residuals in the results of [10]. Figure 4 presents some more results of our method. Especially, Fig. 4(a),(b) and (c) show three traditional examples where the glass is placed between the background and the reflected objects. In Fig. 4(d), we capture a reflec- Effect of grouping edge pixels For the source image shown in Fig. 6(a), we show in Fig. 6(b) and (c) the results without and with motion score averaging in edge segments, respectively. Clearly, the average strategy has removed the reflectance residuals quite successfully. 4. CONCLUSIONS We have presented an effective method for reflection removal by using two input images that are taken in front of a transparent glass under two slightly different view angles. Our method first computes two priors for each edge pixel, namely, the intensity score (gradient magnitude) and the aggregated motion score (from a SIFT-flow hierarchy built upon a number of smoothness levels). Two scores are composed into a combination score, which is then used to distinguish background and reflection. We have presented several examples to demonstrate the necessity of each of these two scores and the combination score. We have also presented some results to show that our method can achieve a better performance compared to some existing works. 5. ACKNOWLEDGMENTS This work has been supported by National Natural Science Foundation of China (61502079 and 61370148).

6. REFERENCES [1] A. Agrawal, R. Raskar, S. K. Nayar, and Y. Li. Removing photography artifacts using gradient projection and flash-exposure sampling. In ACM Trans. Graph., volume 24, pages 828 835, 2005. [2] P. F. Felzenszwalb and D. P. Huttenlocher. Efficient graph-based image segmentation. Int. J. Comput. Vis., 59(2):167 181, 2004. [3] K. Gai, Z. Shi, and C. Zhang. Blindly separating mixtures of multiple layers with spatial shifts. In Proc. CVPR, pages 1 8, 2008. [4] K. Gai, Z. Shi, and C. Zhang. Blind separation of superimposed moving images using image statistics. 34(1):19 32, 2012. [5] N. Kong, Y.-W. Tai, and S. Y. Shin. High-quality reflection separation using polarized images. IEEE Trans. on Image Processing, 20(12):3393 3405, 2011. [6] N. Kong, Y.-W. Tai, and S. Y. Shin. A physically-based approach to reflection separation. In Proc. CVPR, pages 9 16, 2012. [7] A. Levin and Y. Weiss. User assisted separation of reflections from a single image using a sparsity prior. (9):1647 1654, 2007. [8] A. Levin and Y. Weiss. User assisted separation of reflections from a single image using a sparsity prior. (9):1647 1654, 2007. [9] A. Levin, A. Zomet, and Y. Weiss. Separating reflections from a single image using local features. In Proc. CVPR, 2004. [10] Y. Li and M. Brown. Exploiting reflection change for automatic reflection removal. In Proc. CVPR, pages 2432 2439, 2013. [11] Y. Li and M. Brown. Single image layer separation using relative smoothness. In Proc. CVPR, pages 2752 2759, 2014. [12] C. Liu, J. Yuen, and A. Torralba. Sift flow: Dense correspondence across scenes and its applications. 33(5):978 994, 2011. [13] N. Ohnishi, K. Kumaki, T. Yamamura, and T. Tanaka. Separating real and virtual objects from their overlapping images. In Proc. ECCV, pages 636 646. 1996. [14] B. Sarel and M. Irani. Separating transparent layers of repetitive dynamic behaviors. In Proc. ICCV, pages 26 32, 2005. [15] Y. Y. Schechner, N. Kiryati, and R. Basri. Separation of transparent layers using focus. Int. J. Comput. Vis., 39(1):25 39, 2000. [16] Y. Y. Schechner, J. Shamir, and N. Kiryati. Polarization-based decorrelation of transparent layers: The inclination angle of an invisible surface. In Proc. ICCV, volume 2, pages 814 819, 1999. [17] R. Szeliski, S. Avidan, and P. Anandan. Layer extraction from multiple images containing reflections and transparency. In Proc. CVPR, pages 246 253, 2000. [18] T. ue, M. Rubinstein, C. Liu, and W. T. Freeman. A computational approach for obstruction-free photography. ACM Trans. Graph., 34(4):79, 2015.