Beyond Wavelets: Directional Multiresolution Image Representation Minh N. Do Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ minhdo minhdo@uiuc.edu
What Image Processors Do for Living? Compression Original Lena Image JPEG2000 Compressed (256 x 256 Pixels, 24-Bit RGB) (Compression Ratio 43:1) 1
What Image Processors Do for Living? Denoising (restoration/filtering) Noisy image Clean image 2
What Image Processors Do for Living? Feature extraction (e.g. for content-based image retrieval) Images Signatures Feature extraction Similarity Measurement Query Feature extraction N best matched images 3
Fundamental Question: Parsimonious Representation of Visual Information A randomly generated image A natural image Natural images live in a very tiny bit of the image space (e.g. R 512 512 3 ) 4
Mathematical Foundation: Sparse Representations Fourier, Wavelets... = construction of bases for signal expansions: x = n c n ψ n, where c n = x, ψ n. M Linear approximation: ˆx M = c n ψ n, Non-linear approximation: n=1 x M = n I M c n ψ n, keep first M components. keep best M components. 5
Fourier vs. Wavelets Non-linear approximation: N = 1024 data samples; keep M = 128 coefficients 40 20 0 Original 20 0 100 200 300 400 500 600 700 800 900 1000 40 20 0 20 0 100 200 300 400 500 600 700 800 900 1000 40 20 0 Using Fourier (of size 1024): SNR = 22.03 db Using wavelets (Daubechies 4): SNR = 37.67 db 20 0 100 200 300 400 500 600 700 800 900 1000 6
The Success of Wavelets Wavelets provide a sparse representation for piecewise smooth signals. Multiresolution, tree structures, fast transforms and algorithms, etc. Unifying theory fruitful interaction between different fields. 7
Is This the End of the Story? 8
The Failure of Wavelets Wavelets (separable transforms) have some limitations in higher dimensions. In 1-D: Wavelets are well adapted to abrupt changes or singularities. In 2-D: Separable wavelets are well adapted to point-singularities (only). But, there are (mostly) line- and curve-singularities... 9
Goal: Sparse Representation for Typical Images with Smooth Contours regions smooth boundary smooth Goal: Exploring the intrinsic geometrical structure in natural images. Action is at the edges! 10
Wavelet vs. New Scheme Wavelet Xlet For images: Wavelet scheme... fails to recognize that boundary is smooth. New scheme... requires challenging non-separable constructions. 11
And What The Nature Tells Us... Human visual system: Extremely efficient: 10 7 bits 20-40 bits (per second). Receptive fields are characterized as localized, multiscale and oriented. Sparse components of natural images (Olshausen and Field, 1996): Search for Sparse Code 16 x 16 patches from natural images 12
Recent Breakthrough from Harmonic Analysis: Curvelets [Candès and Donoho, 1999] Optimal representation for functions in R 2 with curved singularities. Key idea: anisotropy scaling relation for curves: width length 2 u = u(v) u w v PSfrag replacements l 13
Wish List for New Image Representations Multiresolution... successive refinement Localization... both space and frequency Critical sampling... correct joint sampling Directionality... more directions Anisotropy... more shapes Our emphasis is on discrete framework that leads to algorithmic implementations. 14
Challenge: Being Digital! Pixelization problem: Rotation problem: 15
How Do Curvelets Improve Wavelets? c2 j/2 basis functions 2 j g replacements c2 j/2 2 j wavelet curvelet Along a smooth (a C 2 function) boundary, at the scale 2 j : Wavelet: O(2 j ) significant coefs x x (wavelet) M 2 O(M 1 ) Curvelet: O(2 j/2 ) significant coefs x x (curvelet) M 2 O(M 2 ) 16
Proposed Filter Bank Approach Idea: Multiscale and Directional Decomposition Multiscale dec.: capture point discontinuities, followed by... Directional dec.: link point discontinuities into linear structures. 17
Analogy: Hough Transform in Computer Vision Input image Edge image Hough image = = Challenges: Perfect reconstruction. Fixed transform with low redundancy. Sparse representation for images with smooth contours. 18
Multiscale Decomposition using Laplacian Pyramids decomposition reconstruction Reason: avoid frequency scrambling due to ( ) of the HP channel. Laplacian pyramid as a frame operator tight frame exists. New reconstruction: efficient filter bank for dual frame (pseudo-inverse). 19
Directional Filter Banks (DFB) Feature: division of 2-D spectrum into fine slices using iterated tree structured filter banks. ω 2 (π, π) cements PSfrag replacements 5 4 0 1 2 3 7 PSfrag replacements 6 ω 1 6 5 ω 1 ω 1 ω 2 7 4 ω 2 (π, π) 3 2 1 0 (π, π) ( π, π) ( π, π) ( π, π) Background: Bamberger and Smith ( 92) cleverly used quincunx FB s, modulation and shearing. We propose: use DFB to construct directional bases, plus orthogonality. 20
Simplified DFB: Two Building Blocks Frequency splitting by the quincunx filter banks (Vetterli 84). replacements Q y 0 Q x Q y 1 Q + ˆx Shearing by resampling 21
Sampling in Multidimensional Filter Banks Sampling is represented by integer matrices: x d [n] = x[mn] Quincunx sampling lattice: n 2 rag replacements n 1 Q 0 = ( ) 1 1, Q 1 1 1 = ( ) 1 1. 1 1 Resampling matrices for shearing: R 0 = ( ) 1 1, R 0 1 1 = ( ) 1 1, R 0 1 2 = ( ) 1 0, R 1 1 3 = ( ) 1 0. 1 1 22
Examples of Multidimensional Sampling R 2 Q 1 Key [Smith form]: Q 0 = R 1 D 0 R 2 = R 2 D 1 R 1, Q 1 = R 0 D 0 R 3 = R 3 D 1 R 0. where, D 0 = diag(2, 1), D 1 = diag(1, 2). 23
How Frequency is Divided into Finer Direction? placements H (l) k M (l) Q Q H (l+1) 2k H (l+1) 2k+1 M (l+1) M (l+1) 0 1 Overall sampling: M (l) k = [2 Dl 2 i ] R s l(k) separable sampling, then shearing 24
Multichannel View of the Directional Filter Bank H 0 S 0 y 0 S 0 G 0 x H 1 S 1 y 1 S 1 G 1 + ˆx y 2 l 1 H 2 l 1 S 2 l 1 S 2 l 1 G 2 l 1 Use two separable sampling matrices: S k = [ ] 2 l 1 0 0 2 [ ] 2 0 0 2 l 1 0 k < 2 l 1 ( near horizontal direction) 2 l 1 k < 2 l ( near vertical direction) 25
Why Critical Sampling Works? g replacements ω 2 ω 1 ω 2 π H k PSfrag replacements π S k π ω 1 H k S k Frequency tiling in the MDFB: m Z 2 δ Rk (ω 2πS T k m) = 1 for all k = 0,..., 2l 1; ω R 2. where R k is the ideal frequency region for H k. 26
General Bases from the DFB An l-levels DFB creates a local directional basis of l 2 (Z 2 ): { [ S(l) g (l) k k n] }0 k<2 l, n Z 2 G (l) k are directional filters: Sampling lattices (spatial tiling): 2 2 l 1 2 l 1 2 replacements 27
Example of DFB Impulse Responses 32 equivalent filters for the first half channels (basically horizontal directions) of a 6-levels DFB that use the Haar filters 28
Pyramidal Directional Filter Banks (PDFB) Motivation: + add multiscale into the directional filter bank + improve its non-linear approximation power. ω 2 (π, π) (2,2) PSfrag replacements ω 1 multiscale dec. directional dec. ( π, π) Properties: + Flexible multiscale and directional representation for images (can have different number of directions at each scale!) + Tight frame with small redundancy (< 33%) + Computational complexity: O(N 2 ) for N N image. 29
Multiresolution Analysis: Scale w 2 W j PSfrag replacements w 1 V j V j 1 V j 1 = V j W j, L 2 (R 2 ) = j Z W j. 30
Multiresolution Analysis: Direction PSfrag replacements ω0 ω1 V (l) j,k V (l+1) j,2k V (l+1) j,2k+1 V (l) j,k = V (l+1) j,2k V (l+1) j,2k+1, Vj = 2 l 1 k=0 V (l) j,k. 31
Directional Multiresolution Analysis: Contourlets cements 2 j 2 j+l j 2 ρ (l j) j,k,n ω 2 V j PSfrag replacements W (l V j) j j,k W j V j 1 Wj ω 1 V j 1 W (l j) j,k2 j ω 1 ω 2 2 j+l j 2 ρ (l j) j,k,n { W (l) j,k = span ρ (l) j,k (t S(l) k }n Z n) 2 { } Theorem [DoV:02]. φ j0,n, ρ (l j) j,k,n j j 0, 0 k<2 lj, n Z 2 L 2 (R 2 ) for finite l j. is a tight frame of 32
Sampling Grids of Contourlets w l l w w/4 (a) l/2 (b) Sfrag replacements l/2 w/4 (c) (d) 33
Contourlet Features Defined via iterated filter banks fast algorithms, tree structures, etc. With FIR filters compactly supported frames (bases). Based on rectangular grids seamless translation between continuous and discrete worlds. Different prototype functions (ρ j,k ) for different directions. These functions are defined iteratively via filter banks. 34
Relation with the Curvelet Transform To ensure the anisotropy scaling relation for curves: width length 2 The number of directions in the PDFB is doubled at every other finer scale ( Starck et al.) ω 2 (π, π) LP replacements ω 1 ω 2 (π, π) ( π, π) LP LP DFB 8 PSfrag LP replacements DFB 4 DFB 8 ( π, π) ω 1 35
Supports of Basis Functions LP DFB Contourlet * = * = Key point: Each generation doubles spatial resolution as well as angular resolution. 36
Contourlet Packets Adaptive scheme to select the best tree for directional decomposition. ω 2 (π, π) PSfrag replacements ω 1 ( π, π) Contourlet packets directional multiresolution elements with different shapes (aspect ratios). They do not necessary satisfy the curve scaling relation. They can include wavelets! 37
Contourlet Approximation Π L Π L Sfrag replacements see a wavelet polynomial surface Π L LP DFB PSfrag replacements LP DFB 38
Directional Vanishing Moments For a 2-D prototype function ρ(x 1, x 2 ), consider the 1-D slice ρ s,t (x) = ρ(x, sx + t) Definition. ρ(x 1, x 2 ) have L directional vanishing moments (DVM) along the slope s if ρ s,t (x)x n = 0, t R, 0 n < L. Equivalently, the generating discrete filter has L + 1 zeros along the line: ω 1 = sω 2. Filter designing problem for contourlet filter banks. Note: + Wavelets (2-D) have DVM s for horizontal and vertical directions + Ridgelets (continuous-space) have DVM s for all except one directions. 39
Predicted Gain by Non-Linear Approximation regions smooth boundary smooth If the boundary curve is twice differentiable, non-linear approximation rates: Fourier: x x (F ourier) M 2 O(M 1/2 ) Wavelet: x x (wavelet) M 2 O(M 1 ) Contourlet: x x (contourlet) M 2 O(M 2 )... and cannot do better! 40
Embedded Structure for Compression and Modeling So far, best-m term approximation: ˆf M = λ I M c λ ρ λ, where I M is the set of indexes of the M-largest c λ. For compression, additional cost required to specify I M Naive approach: M log 2 N bits With embedded tree (wavelets): M bits Embedded tree for wavelets used in state-of-the-art image compression (EZW,...), rate-distortion analysis (Cohen et al.), and multiscale statistical modeling (Baraniuk et al.) 41
Contourlet Embedded Tree Structure ω 2 (π, π) PSfrag replacements ω 1 ( π, π) Embedded tree data structure for contourlet coefficients: successively locate the position and direction of image edges. 42
Example of Discrete Contourlet Transform 43
Wavelets vs. Contourlets A Wavelet A Contourlet 44
Non-linear Approximation Experiments Image size = 512 512. Keep M = 4096 coefficients. Original image Wavelets: Contourlets: PSNR = 24.34 db PSNR = 25.70 db 45
Detailed Non-linear Approximations Wavelets M = 4, MSE = 1.68e 4 M = 16, MSE = 1.66e 4 M = 64, MSE = 1.60e 4 M = 256, MSE = 1.44e 4 Contourlets M = 4, MSE = 1.68e 4 M = 16, MSE = 1.63e 4 M = 64, MSE = 1.55e 4 M = 256, MSE = 1.43e 4 46
Denoising Experiments original image noisy image (SNR = 9.55 db) wavelet denoising (SNR = 13.82 db) contourlet denoising (SNR = 15.42 db) 47
Speculation: Another Wavelet Story? Wavelets (harmonics analysis) Pyramids (computer vision) Filter banks (signal processing) Multiresolution Analysis Theory Algorithms Applications 1-D Story: scale and shift 48
Speculation: Another Wavelet Story? Curvelets (harmonics analysis) Hough transform (computer vision) Directional filter bank (signal processing) Directional Multiresolution Analysis Theory Algorithms Applications 2-D Story: scale, space, and direction 49
Beyond 2-D and Single Image... New Audio-Visual Paradigm Existing audio-visual recording and playback Single camera and microphone Little processing Viewers: passive Future Sensors (cameras, microphones) are cheap Massive computing and storage capabilities are available Viewers: interactive, immersive, remote Require new signal processing theory and algorithms Note: Video can be treated as a special case (frames in one shoot multiple images of a scene) 50
Image-based Rendering Recording Processing Playback Archive Goal: Synthesize arbitrary viewpoint from a set of fixed sensors. Input data at a huge rate, for example: 12 M/frame 10 frame/s 15 views = 1.8 Gbps. Challenge: New representations for IBR data that incorporate geometric constraints. 51
Light-field Parameterization camera plane image plane object surface t v L(u,v,s,t) L(u,v,s,t ) u s Each light ray is addressed by a 4-D coordinate (u, v, s, t). Assuming the object surface is Lambertian, then L(u, v, s, t) = L(u, v, s, t ). The 4-D light field data L(u, v, s, t) lie in lower dimensional manifolds. 52
Epipolar Constraint from Multiple Views t u epipolar image s Linear singularities in epipolar planes... ridgelets? Curved singularities in image planes... contourlets? Ultimate: High dimensional representations that can deal effectively with lower dimensional singularities. 53
Summary Strong motivation for more powerful image representations: scale, space, and direction. New two-dimensional discrete framework and algorithms: Flexible directional and multiresolution image representation. Effective for images with smooth contours contourlets. Dream: Another fruitful interaction between harmonic analysis, computer vision, and signal processing. 54
Collaborators Prof. Martin Vetterli (EPFL and UC Berkeley) Current graduate students (ECE @ UIUC) Arthur L. A. da Cunha Yue Lu David S. Petruncio Duncan Po Dan Xu Jianping Zhou 55