The Steerable Pyramid: A Flexible Architecture for Multi-Scale Derivative Computation

Similar documents
Parametric Texture Model based on Joint Statistics

Multiresolution Image Processing

Image pyramids and their applications Bill Freeman and Fredo Durand Feb. 28, 2006

Multi-scale Statistical Image Models and Denoising

Learning local evidence for shading and reflectance

Image Interpolation Using Multiscale Geometric Representations

Pyramid Coding and Subband Coding

CoE4TN3 Image Processing. Wavelet and Multiresolution Processing. Image Pyramids. Image pyramids. Introduction. Multiresolution.

Pyramid Coding and Subband Coding

Shift-invariance in the Discrete Wavelet Transform

INVARIANT CORNER DETECTION USING STEERABLE FILTERS AND HARRIS ALGORITHM

Steerable Filters from Erlang Functions

Courant Institute TR Nonlinear Image Representation via Local Multiscale Orientation

Filtering Applications & Edge Detection. GV12/3072 Image Processing.

A Parametric Texture Model based on Joint Statistics of Complex Wavelet Coefficients. Gowtham Bellala Kumar Sricharan Jayanth Srinivasa

Calculating the Distance Map for Binary Sampled Data

Disparity Search Range Estimation: Enforcing Temporal Consistency

Texture. Texture. 2) Synthesis. Objectives: 1) Discrimination/Analysis

Query by Fax for Content-Based Image Retrieval

ME/CS 132: Introduction to Vision-based Robot Navigation! Low-level Image Processing" Larry Matthies"

Digital Image Processing

WAVELET TRANSFORM BASED FEATURE DETECTION

Computer Vision. Recap: Smoothing with a Gaussian. Recap: Effect of σ on derivatives. Computer Science Tripos Part II. Dr Christopher Town

The most cited papers in Computer Vision

Denoising of Images corrupted by Random noise using Complex Double Density Dual Tree Discrete Wavelet Transform

Fast Anisotropic Gauss Filtering

Weichuan Yu, Kostas Daniilidis, Gerald Sommer. Christian Albrechts University. Preusserstrasse 1-9, D Kiel, Germany

Performance Analysis of Iris Recognition System Using DWT, CT and HOG

Feature Extraction and Image Processing, 2 nd Edition. Contents. Preface

Digital Image Processing

CHAPTER 3 DIFFERENT DOMAINS OF WATERMARKING. domain. In spatial domain the watermark bits directly added to the pixels of the cover

Dense Motion Field Reduction for Motion Estimation

Design of Orthogonal Graph Wavelet Filter Banks

EE795: Computer Vision and Intelligent Systems

Efficient MPEG-2 to H.264/AVC Intra Transcoding in Transform-domain

Texture Similarity Measure. Pavel Vácha. Institute of Information Theory and Automation, AS CR Faculty of Mathematics and Physics, Charles University

Color and Texture Feature For Content Based Image Retrieval

Comment on Numerical shape from shading and occluding boundaries

Pictures at an Exhibition

Image Fusion Using Double Density Discrete Wavelet Transform

William Tafel Freeman. B.S., Physics with Honors and Distinction, Stanford University, M.S., Electrical Engineering, Stanford University, 1979

A Neural Algorithm of Artistic Style. Leon A. Gatys, Alexander S. Ecker, Mattthias Bethge Presented by Weidi Xie (1st Oct 2015 )

Rectification and Distortion Correction

Texture Analysis of Painted Strokes 1) Martin Lettner, Paul Kammerer, Robert Sablatnig

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

Local Feature Detectors

Autoregressive and Random Field Texture Models

Nonlinear Multiresolution Image Blending

Reconstruction PSNR [db]

Disparity from Monogenic Phase

Today: non-linear filters, and uses for the filters and representations from last time. Review pyramid representations Non-linear filtering Textures

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 8, NO. 12, DECEMBER Image Representations Using Multiscale Differential Operators.

Feature Tracking and Optical Flow

ECG782: Multidimensional Digital Signal Processing

Schedule for Rest of Semester

Feature Tracking and Optical Flow

A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coecients

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 12, DECEMBER Minh N. Do and Martin Vetterli, Fellow, IEEE

Local qualitative shape from stereo. without detailed correspondence. Extended Abstract. Shimon Edelman. Internet:

Fitting Models to Distributed Representations of Vision

CS6670: Computer Vision

Final Exam Study Guide

Peripheral drift illusion

Exhaustive Generation and Visual Browsing for Radiation Patterns of Linear Array Antennas

Comparative Evaluation of DWT and DT-CWT for Image Fusion and De-noising

Evaluation of texture features for image segmentation

Frequency analysis, pyramids, texture analysis, applications (face detection, category recognition)

The Vehicle Logo Location System based on saliency model

Fast Region-of-Interest Transcoding for JPEG 2000 Images

CAP 5415 Computer Vision Fall 2012

Image Compression & Decompression using DWT & IDWT Algorithm in Verilog HDL

Module 8: Video Coding Basics Lecture 42: Sub-band coding, Second generation coding, 3D coding. The Lecture Contains: Performance Measures

12/07/17. Video Magnification. Magritte, The Listening Room. Computational Photography Derek Hoiem, University of Illinois

Lecture 7: Most Common Edge Detectors

Fundamentals of Digital Image Processing

Image Coding with Active Appearance Models

DIGITAL IMAGE PROCESSING

Image processing and features

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Dynamic Window Decoding for LDPC Convolutional Codes in Low-Latency Optical Communications

Comparison Between The Optical Flow Computational Techniques

Wavelet Transform (WT) & JPEG-2000

Matching. Compare region of image to region of image. Today, simplest kind of matching. Intensities similar.

Constraining Rainfall Replicates on Remote Sensed and In-Situ Measurements

CS 534: Computer Vision Texture

Extensions of H.264/AVC for Multiview Video Compression

Spatial Enhancement Definition

Visible and Long-Wave Infrared Image Fusion Schemes for Situational. Awareness

IMPROVED MOTION-BASED LOCALIZED SUPER RESOLUTION TECHNIQUE USING DISCRETE WAVELET TRANSFORM FOR LOW RESOLUTION VIDEO ENHANCEMENT

Digital Image Processing Chapter 11: Image Description and Representation

Overcomplete Steerable Pyramid Filters and Rotation Invariance

Leaf Classification from Boundary Analysis

Tracking Computer Vision Spring 2018, Lecture 24

Digital Image Processing COSC 6380/4393

ELEC Dr Reji Mathew Electrical Engineering UNSW

Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform

Capturing, Modeling, Rendering 3D Structures

Lucas-Kanade Motion Estimation. Thanks to Steve Seitz, Simon Baker, Takeo Kanade, and anyone else who helped develop these slides.

Lecture 16: Computer Vision

Fast Image Registration via Joint Gradient Maximization: Application to Multi-Modal Data

Transcription:

MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com The Steerable Pyramid: A Flexible Architecture for Multi-Scale Derivative Computation Eero P. Simoncelli, William T. Freeman TR95-15 December 1995 Abstract We describe an architecture for efficient and accurate linear decomposition of an image into scale and oeientation subbands. The basis functions of this decomposition are directional derivative operators of any desired order. We describe the construction and implementation of the transform. 2nd Ann. IEEE Intl. Conf. on Image Processing. Washington, DC. October, 1995 This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. Copyright c Mitsubishi Electric Research Laboratories, Inc., 1995 201 Broadway, Cambridge, Massachusetts 02139

MERLCoverPageSide2

Presented at: 2nd Annual IEEE International Conference on Image Processing. Washington, DC. October, 1995. THE STEERABLE PYRAMID: A FLEXIBLE ARCHITECTURE FOR MULTI-SCALE DERIVATIVE COMPUTATION Eero P Simoncelli GRASP Laboratory, Room 335C 3401 Walnut St, Rm. 335C Philadelphia, PA 19104-6228 William T Freeman Mitsubishi Electric Research Laboratory Cambridge, MA 02139 ABSTRACT We describe an architecture for efficient and accurate linear decomposition of an image into scale and orientation subbands. The basis functions of this decomposition are directional derivative operators of any desired order. We describe the construction and implementation of the transform. 1 Differential algorithms are used in a wide variety of image processing problems. For example, gradient measurements are used as a first stage of many edge detection, depth-from-stereo, and optical flow algorithms. Higher-order derivatives have also been found useful in these applications. Extraction of these derivative quantities may be viewed as a decomposition of a signal via terms of a local Taylor series expansions [1]. Another widespread tool in signal and image processing is multi-scale decomposition. Apart from the advantages of decomposing signals into information at different scales, the typical recursive form of these algorithms leads to large improvements in computational efficiency. Many authors have combined multi-scale decompositions with differential measurements (eg., [2, 3]). In these cases, a multi-scale pyramid is constructed, and then differential operators (typically, differences of neighboring pixels) are applied to the subbands of the pyramid. Since both the pyramid decomposition and the derivative operation are linear and shift-invariant, we may combine them into a single operation. The advantages of doing so are that the resulting derivatives may be more accurate (see [4]). In this paper, we propose a simple, efficient decomposition architecture for combining these two operations. The decomposition is the latest incarnation of 1 Source code and filter kernels for implementation of the steerable pyramid are available via anonymous ftp from ftp.cis.upenn.edu:pub/eero/steerpyr.tar.z steerable pyramid, as developed in [5, 6]. Similar representations have been developed by Perona [7]. In this linear decomposition, an image is subdivided into a collection of subbands localized in both scale and orientation. The scale tuning of the filters is constrained by a recursive system diagram (described below). The orientation tuning is constrained by the property of steerability [5]. A set of filters form a steerable basis if (1) they are rotated copies of each other, and (2) a copy of the filter at any orientation may be computed as a linear combination of the basis filters. The simplest example of a steerable basis is a set of N + 1 N th-order directional derivatives. In addition to having steerable orientation subbands, the transform we describe is designed to be "self-inverting" (i.e., the matrix corresponding to the inverse transformation is equal to the transpose of the forward transformation matrix) 2, and is essentially aliasing-free. Most importantly, the pyramid can be designed to produce any number of orientation bands, k. The resulting transform will be overcomplete by a factor of 4k=3. A summary of these properties, in comparison with two well-known multi-scale decompositions is given in table 1. Note that the steerable pyramid retains some of the advantages of orthonormal wavelet transforms (eg., basis functions are localized in space and spatial-frequency; the transform is a tight frame), but improves on some of their disadvantages (eg., aliasing is eliminated; steerable orientation decomposition). One obvious disadvantage is in computational efficiency: the steerable pyramid is substantially overcomplete. We now describe the steerable pyramid in more detail. The decomposition is most easily defined in the Fourier domain, where it is (ideally) polarseparable. Figure 1 contains a diagram of the 2 In the wavelet literature, such a transform is known as a tight frame [8]

Laplacian Pyramid Dyadic QMF/Wavelet Steerable Pyramid self-inverting (tight frame) no yes yes overcompleteness 4=3 1 4k=3 aliasing in subbands perhaps yes no rotated orientation bands no only on hex lattice [9] yes Table 1: Properties of the Steerable Pyramid relative to two other well-known multi-scale representations. to multiplication in the Fourier domain by the ramp raised to a power, and thus the angular portion of the filter is cos() N for an Nth-order directional derivative. Knuttson and Granlund have also developed polar-separable filters with such angular components [10]. The steerability of such functions has been discussed in our previous work [5, 6]. 2. RADIAL DECOMPOSITION Figure 1: Idealized illustration of the spectral decomposition performed by a steerable pyramid with k = 4. Frequency axes range from, to. The basis functions are related by translations, dilations and rotations (except for the initial highpass subband and the final lowpass subband). For example, the shaded region corresponds to the spectral support of a single (vertically-oriented) subband. idealized frequency response of the subbands, for k = 4. We write the the Fourier magnitude of the ith oriented bandpass filter in polar-separable form: B i (~!)=A(, i )B(!); where = tan,1 (! y =! x ), i = 2 and! = j~!j. Below, we describe the constraints on the two com- k ponents A() and B(!). 1. ANGULAR DECOMPOSITION The angular portion of the decomposition, A(), is determined by the desired derivative order. A directional derivative operation in the spatial domain corresponds to multiplication by a linear ramp function in the Fourier domain, which we rewrite in polar coordinates as follows:,j! x =,j! cos() (note that we have described a derivative operator in the x direction). We ignore the imaginary constant, and the factor of!, which is absorbed into the radial portion of the function. The relevant angular portion of the first derivative operator (in the x direction) is thus cos(). Higher-order directional derivatives correspond The radial function, B(!), is constrained by both the desire to build the decomposition recursively (i.e., using a pyramid algorithm), and the need to prevent aliasing from occurring during subsampling operations. The recursive system diagram for B(!) is given in figure 2. The filters H 0 (!) and L 0 (!) are necessary for preprocessing the image in preparation for the recursion. The recursive portion of the diagram corresponds to the subsystem contained in the dashed box. This subsystem decomposes a signal into two portions (lowpass and highpass). The lowpass portion is subsampled, and the recursion is performed by repeatedly applying the recursive transformation to the lowpass signal. The constraints on the filters in the diagram are as follows: 1. Bandlimiting (to prevent aliasing in the subsampling operation): L 1 (!)=0 2. Flat System Response: forj!j >=2: jh 0 (!)j 2 + jl 0 (!)j 2 jl 1 (!)j 2 + jb(!)j 2 = 1: 3. Recursion: jl 1 (!=2)j 2 = jl 1 (!=2)j 2 jl 1 (!)j 2 + jb(!)j 2 : Typically, we choose L 0 (!) = L 1 (!=2), so that the initial lowpass shape is the same as that used within the recursion. An idealized illustration of filters that satisfy these constraints is given in figure 3. Note that L 1 (!) is strictly bandlimited, and B(!) is power-complementary.

H 0 (- ) H 0 ( ) L 0 (- ) B(- ) B( ) L 0 ( ) L 1 (- ) 2D 2U L 1 ( ) recursive subsystem Figure 2: System diagram for the radial portion of the steerable pyramid, illustrating the filtering and sampling operations, and the recursive construction. Boxes containing 2D and 2U correspond to downsampling and upsampling by a factor of 2. All other boxes correspond to standard 2D convolution. The circles correspond to the transform coefficients. The recursive construction of a pyramid is achieved by inserting a copy of the diagram contents enclosed by the dashed rectangle at the location of the solid circle (i.e., the lowpass branch). L 0 ( ) H 0 ( ) L 1 ( ) B( ) Figure 3: Idealized depiction of filters satisfying the constraints of the block diagram in figure 2. Plots show Fourier spectra over the range [0; ]. 3. IMPLEMENTATION We have designed filters using weighted least squares techniques in the Fourier domain to approximately fit the constraints detailed above. The resulting filters are fairly compact (typically 9 9 taps) and accurate (reconstruction error on the order of 45dB). Such filters may be designed for different values of k, depending on the application. For example, a design with a single band at each scale (k = 1) serves as a (self-inverting) replacement for the Laplacian pyramid. A design with two bands (k = 2) will compute multi-scale image gradients, which may be used for computations of local orientation, stereo disparity or optical flow. Higher values of k correspond to higher order terms in a multi-scale Taylor series. Figure 4 illustrates a 3-level steerable pyramid decomposition of a disk image, with k = 1. Shown are the bandpass images and the final lowpass image (the initial highpass image is not shown). As noted above, this pyramid may be used in applications where the Laplacian pyramid has been found useful, such as in image coding. The advantage is that the steerable pyramid is selfinverting, and thus the errors introduce by quantization of the subbands will not appear as lowfrequency distortions upon reconstruction. Figure 5 illustrates a 3-level steerable pyramid decomposition with k = 3. The filters are Figure 4: A 3-level k = 1 (non-oriented) steerable pyramid. Shown are the bandpass images and the final lowpass image. directional second derivatives oriented at 2 f,2=3; 0; 2=3g. Such a decomposition can be used for orientation analysis, edge detection, etc. We have explored the use of this decomposition in a number of applications, including image enhancement, orientation decomposition and junction identification, texture blending, depth-fromstereo, and optical flow. Space limitations prevent full description of these applications here; some previous results are described in [5, 6].

[9] Eero P Simoncelli and Edward H Adelson. Non-separable extensions of quadrature mirror filters to multiple dimensions. Proceedings of the IEEE: Special Issue on Multidimensional Signal Processing, April 1990. Also available as MIT Media Laboratory Vision and Modeling Technical Report #119. Figure 5: A 3-level k = 3 (second derivative) steerable pyramid. Shown are the three bandpass images at each scale and the final lowpass image. [10] H Knutsson and G H Granlund. Texture analysis using two-dimensional quadrature filters. In IEEE Computer Society Workshop on Computer Architecture for Pattern Analysis and Image Database Management, pages 206 213, 1983. 4. REFERENCES [1] J. J. Koenderink and A. J. van Doorn. Representation of local geometry in the visual system. Biol. Cybern., 55:367 375, 1987. [2] B D Lucas and T Kanade. An iterative image registration technique with an application to stereo vision. In Proc. Seventh IJCAI, pages 674 679, Vancouver, 1981. [3] Peter Burt. Pyramid methods in image processing. In Proceedings of the First International Conference on Image Processing, Austin, November 1994. plenary presentation. [4] Eero P Simoncelli. Design of multidimensional derivative filters. In First International Conference on Image Processing, Austin, Texas, November 1994. [5] W T Freeman and E H Adelson. The design and use of steerable filters. IEEE Pat. Anal. Mach. Intell., 13(9):891 906, 1991. [6] Eero P Simoncelli, William T Freeman, Edward H Adelson, and David J Heeger. Shiftable multi-scale transforms. IEEE Trans. Information Theory, 38(2):587 607, March 1992. Special Issue on Wavelets. [7] H. Greenspan, S. Belongie, R. Goodman, P. Perona, S. Rakshit, and C. H. Anderson. Overcomplete steerable pyramid filters and rotation invariance. In Proceedings, CVPR, pages 222 228, 1994. [8] I Daubechies. Ten lectures on wavelets. In Regional Conference: Society for Industrial and Applied Mathematics, Philadelphia, PA, 1992.