Sparse & Redundant Representations and Their Applications in Signal and Image Processing

Similar documents
Generalized Tree-Based Wavelet Transform and Applications to Patch-Based Image Processing

Optimal Denoising of Natural Images and their Multiscale Geometry and Density

Image Processing Via Pixel Permutations

PATCH-DISAGREEMENT AS A WAY TO IMPROVE K-SVD DENOISING

Image Restoration and Background Separation Using Sparse Representation Framework

Sparse & Redundant Representation Modeling of Images: Theory and Applications

A parallel patch based algorithm for CT image denoising on the Cell Broadband Engine

Expected Patch Log Likelihood with a Sparse Prior

Sparse & Redundant Representation Modeling of Images: Theory and Applications

Generalizing the Non-Local-Means to Super-resolution Reconstruction

Combinatorial Selection and Least Absolute Shrinkage via The CLASH Operator

SUPPLEMENTARY MATERIAL

IMAGE RESTORATION VIA EFFICIENT GAUSSIAN MIXTURE MODEL LEARNING

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

Image Denoising Via Learned Dictionaries and Sparse representation

Signal Reconstruction from Sparse Representations: An Introdu. Sensing

Digital Image Processing Laboratory: MAP Image Restoration

IMAGE DENOISING USING NL-MEANS VIA SMOOTH PATCH ORDERING

Image Reconstruction from Multiple Sparse Representations

Adaptive Reconstruction Methods for Low-Dose Computed Tomography

Variable Selection 6.783, Biomedical Decision Support

Statistical image models

Clustering web search results

Image Denoising Using Sparse Representations

Modified Iterative Method for Recovery of Sparse Multiple Measurement Problems

Single Image Interpolation via Adaptive Non-Local Sparsity-Based Modeling

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Boosting Simple Model Selection Cross Validation Regularization

An Optimized Pixel-Wise Weighting Approach For Patch-Based Image Denoising

SUPER-RESOLUTION reconstruction proposes a fusion of

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

A Short SVM (Support Vector Machine) Tutorial

Structure-adaptive Image Denoising with 3D Collaborative Filtering

Image Restoration Using DNN

On Single Image Scale-Up using Sparse-Representation

Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations

A* Orthogonal Matching Pursuit: Best-First Search for Compressed Sensing Signal Recovery

Efficient Imaging Algorithms on Many-Core Platforms

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision

Lab 9. Julia Janicki. Introduction

Optimization Methods for Machine Learning (OMML)

10701 Machine Learning. Clustering

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

10601 Machine Learning. Model and feature selection

ELEG Compressive Sensing and Sparse Signal Representations

Super-resolution on Text Image Sequences

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2

GT "Calcul Ensembliste"

LP-Modelling. dr.ir. C.A.J. Hurkens Technische Universiteit Eindhoven. January 30, 2008

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Applying Machine Learning to Real Problems: Why is it Difficult? How Research Can Help?

This chapter explains two techniques which are frequently used throughout

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14

Markov Random Fields and Gibbs Sampling for Image Denoising

CRF Based Point Cloud Segmentation Jonathan Nation

FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG)

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Smoothing Dissimilarities for Cluster Analysis: Binary Data and Functional Data

Combinatorial optimization and its applications in image Processing. Filip Malmberg

10-701/15-781, Fall 2006, Final

CSE 490R P1 - Localization using Particle Filters Due date: Sun, Jan 28-11:59 PM

Bayesian analysis of genetic population structure using BAPS: Exercises

A General Greedy Approximation Algorithm with Applications

Wavelet for Graphs and its Deployment to Image Processing

Advanced phase retrieval: maximum likelihood technique with sparse regularization of phase and amplitude

Regularization and model selection

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Simple Model Selection Cross Validation Regularization Neural Networks

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Scanning Real World Objects without Worries 3D Reconstruction

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

Extracting Coactivated Features from Multiple Data Sets

Naïve Bayes for text classification

Laboratorio di Problemi Inversi Esercitazione 4: metodi Bayesiani e importance sampling

10.4 Linear interpolation method Newton s method

Support Vector Machines

Chapter 2: The Normal Distribution

What is a Graphon? Daniel Glasscock, June 2013

MultiVariate Bayesian (MVB) decoding of brain images

Estimating Human Pose in Images. Navraj Singh December 11, 2009

The Cross-Entropy Method for Mathematical Programming

IMAGE DENOISING TO ESTIMATE THE GRADIENT HISTOGRAM PRESERVATION USING VARIOUS ALGORITHMS

IMA Preprint Series # 2211

Clustered Compressive Sensing: Application on Medical Imaging

Support Vector Machines

Sparsity and image processing

IMA Preprint Series # 2168

STA 4273H: Statistical Machine Learning

Image Processing using Smooth Ordering of its Patches

Large Scale Data Analysis Using Deep Learning

Computer Vision I - Basics of Image Processing Part 1

Machine Learning. Unsupervised Learning. Manfred Huber

Bayesian Machine Learning - Lecture 6

1.2 Numerical Solutions of Flow Problems

When Sparsity Meets Low-Rankness: Transform Learning With Non-Local Low-Rank Constraint For Image Restoration

Equations of planes in

An Implementation and Detailed Analysis of the K-SVD Image Denoising Algorithm

Transcription:

Sparse & Redundant Representations and Their Applications in Signal and Image Processing Sparseland: An Estimation Point of View Michael Elad The Computer Science Department The Technion Israel Institute of technology Haifa 3000, Israel

A Strange Experiment

What If Consider the denoising problem min s.t. D z 0 and suppose that we can find a group of J candidate solutions J j j1 such that j 0 j D j z n Basic Questions: What could we do with such a set of competing solutions in order to better denoise z? Why should this help? How shall we practically find such a set of solutions? Relevant work: [Leung & Barron ( 06)] [Larsson & Selen ( 07)] [Schintter et. al. (`08)] [Elad and Yavneh ( 08)] [Giraud ( 08)] [Protter et. al. ( 10)]

Why Bother? Because each representation conveys a different story about the signal Because pursuit algorithms are often wrong in finding the sparsest representation, and then relying on their solution is too sensitive And then maybe there are deeper reasons? D D 1

Generating Many Sparse Solutions Our answer: Randomizing the OMP Initialization k 0, 0 r y D z 0 0 0 and S 0 Main Iteration T 1. Compute p(i) di rk 1 for 1 i m k k 1. Choose i0 s.t. 1 i m, p(i 0) p(i) 3. Update S k : Sk Sk 1 i0 4. LS : k min D z s.t. sup S k 5. Update Residual: r z D No k r k k Yes Stop We Randomize Step Choose i Choose i 0 with probability 0 T i 0 exp c d r s.t. 1 i m, p(i ) p(i) k1 For now, let s set the parameter c manually for best performance. Later we shall define a way to set it automatically

Let s Try The Following: Form a random dictionary D Multiply by a sparse vector α 0 having 10 non-zeros Add Gaussian iid noise v with σ=1 and obtain z Solve the P 0 problem by OMP, and obtain α OMP min s.t. D z 10 0 Use Random-OMP and obtain the set {α j RandOMP } Let s look at the obtained representations 100 D 00 + = 0 v z

Results: The Obtained Set {α j RandOMP } The OMP gives the sparsest solution (NNZ=) The others are in the range [,0] As expected, all the representations satisfy D z 100

Results: Denoising Performance? Dˆ D z D 0 0 0.1753 Even though OMP is the sparsest, it is not the most effective for denoising The cardinality does not reveal its efficiency

And now to the Surprise Lets propose the average 1000 1 ˆ 1000 j1 as our representation RandOMP j This representation is not sparse at all and yet Dˆ D0 0.05 z D 0 compared to 0.1753 for OMP

Is it Consistent? Let s repeat this experiment many (1000) times Dictionary (random) of size n=100, m=00 True support of α is 10 We run OMP for denoising We run RandOMP J=1000 times and average 1 J RandOMP ˆ j J j1 Denoising is assessed as before OMP: 0.1808 ROMP: 0.1077 Cases of zero solution The results of 1000 trials lead to the same conclusion How could we explain this?

A Crash-Course on Estimation Theory

Defining Our Goal We are interested in the signal x Unfortunately, we get instead a measurement z We do know that z is related to x via the following conditional probabilities P(z x) or P(x z) Estimation theory is all about algorithms for inferring x from z based on these Obviously, a key element in our story is the need to know these P s sometimes this is a tough task by itself

The Maximum-Likelihood (ML) P(z x) This conditional is known as the likelihood function, describing the probability of getting the measurements z if x is given A natural estimator to propose is the Maximum Likelihood ˆx ML argmax P z x x ML is very popular, but in many situations it is quite weak or even useless This brings us to the Bayesian approach

The MAP Estimation P(z x) P(x z) This is a deep philosophical change in the approach we take, as now we consider x as random as well Due to Bayes rule we have that P z x P x P x z Pz The Maximum A posteriori Probability (MAP) estimator suggests MAP x ˆx argmax P x z argmax P z x P x x

The MAP A Closer Look The MAP estimator is given by ˆx argmax P x z argmax P z x P x MAP x In words, it seeks the most probable x given z, which fits exactly our story (z is given while x is unknown) MAP resembles ML with one major distinction P(x) enters and affects the outcome. This is known as the prior This is the Bayesian approach in estimation, but MAP is not the ONLY Bayesian estimator x

MMSE Estimation Given the posterior P(x z), what is the best one could do? Answer: find the estimator that minimizes the Mean-Squared-Error, given by ˆ ˆ ˆ MSE x x x P x z dx E x x z Lets take a derivative w.r.t. the unknown ˆ ˆx x x P x z dx 0 x P x z dx P x z dx E x z MMSE estimation is a conditional expectation

MMSE versus MAP In general, these two estimators are different They align if the posterior is a symmetric (e.g. Gaussian distribution) Typically, MMSE is much harder to compute, which explains the popularity of MAP P x z ˆx MAP ˆx MMSE x

Sparseland : An Estimation Point of View

Our Signal Model n D is fixed and known Assume that is built by: o Choosing the support s with probability P(s): P(kS)=p are drawn independently o Choosing the s coefficients using iid Gaussian entries IN(0, x ) m D x The ideal signal is x=d=d s s α P(α) and P(x) are known

Adding Noise n m D x + The noise v is additive white & Gaussian α P z x C exp x z P(z α) & P(α z) and even P(z s) & P(s z) could all be derived v z

So, Lets Estimate Given P(α z) or P(s z) MAP Oracle known support s MMSE MAP ˆ ArgMax P( z) oracle ˆ MMSE ˆ E z or MAP ŝ ArgMax P(s z) Why Oracle? Because It is a building block in the derivation of MAP and MMSE It poses a bound on the best achievable performance

Deriving the Oracle Estimate s P z,s P z sp s Pz P z z Ds s P s exp P z s exp z Ds s s P s z exp x ˆ 1 1 D D 1 I 1 D T T s s s s x Q 1 s h s z s x This is both MAP and MMSE The estimate of x is obtained by D s s

The MAP Estimate of the Support MAP P(z s)p(s) ŝ ArgMax P(s z) ArgMax s s P(z) Using a marginalization trick we get P(z s) P(z s, )P( )d s s s s T 1 s hsqs hs log(det( Qs)) x exp The expression within the integral is pure Gaussian and thus a closed-form expression is within reach

The MAP Estimate of the Support MAP P(z s)p(s) ŝ ArgMax P(s z) ArgMax s s P(z) Based on our prior for p 1 p P generating the support s 1 p s m T 1 s MAP hsqs hs log(det( Qs)) p Max exp s (1 p) x ŝ

The MAP Estimate: Implications T 1 s MAP hsqs hs log(det( Qs)) p Max exp s (1 p) x ŝ The MAP estimator requires to test all the possible supports for the maximization Once the support is found, the oracle formula is used to obtain s This process is usually impossible due to the combinatorial # of possibilities This is why we rarely use exact MAP, and replace it with an approximation (e.g., OMP)

The MMSE Estimation MMSE ˆ E z P(s z) E z,s s P(s z) P(s) P(z s)... This is the oracle for s 1 ˆ s Q E z,s h T 1 hsqs hs log(det( Qs)) p exp (1 p) x s s s ˆ MMSE s P(s y) ˆ s

The MMSE Estimation: Implication ˆ MMSE s P(s y) ˆ The best estimator (in terms of L error) is a weighted average of many sparse representations!!! As such, it is not expected to be sparse at all As in the MAP case, one cannot compute this expression, as the summation is over a combinatorial set of possibilities. We should propose approximations here as well s

Sparseland: Approximate Estimation

The Case of s =1 P(s z) where T 1 s hsqs hs log(det( Qs)) p exp (1 p) x 1 1 1 Q D D I & h D z T T s s s s s x s =1 implies that D s has only one column and thus Q s =1/ +1/ x (scalar) and h s =(z T d k )/ The right-most term is a constant and omitted Little bit of algebra and we get 1 x P(s z) exp (z d k) x T

The Case of s =1: A Closer Look 1 x P(s z) exp (z d ) x This is c in the Random-OMP T k Based on this we can propose the first step of a greedy algorithm for both MAP and MMSE: MAP Choose the atom with the largest (z T d k ) value (out of m) The same as OMP does MMSE Compute these m probabilities, and draw at random an atom from this distribution this is exactly what Random-OMP does

What About the Next Steps? Random-OMP Suppose we have k-1 non-zeros, and we are about to choose the k-th one: Option 1: Compute the probabilities of P(s z) T 1 hsqs hs log(det( Qs)) P(s z) exp (m-k+1 scalar values) in which the k-1 chosen atoms are fixed, and use this distribution to either maximize or draw a random atom Option (Simpler): Use the same rule as in the first step to proceed 1 x P(s z) exp (r d ) x T k1 k

Relative Representation Mean-Squared-Error A Demo These results correspond to a small dictionary (10 16), where the combinatorial formulas can be evaluated as well Parameters: n,m: 10 16 p=0.1 σ x =1 J=50 (RandOMP) 0.8 0.6 0.4 0. Averaged over 1000 0 experiments 0. 0.4 0.6 0.8 1 1 Oracle MMSE MAP OMP Rand-OMP

Few Words on the Unitary Case We will not derive the equations for this case, and simply show the outcome: Both MAP and MMSE have a closedform solution T T 1 p MAP c z dk z dk log c k ˆ p 1 c 0 Otherwise MAP ˆ k The MAP estimate is obtained by 0 hard-thresholding of this form: x -1 c 3 1 - p=0.1 =0.3 x =1-3 -3 - -1 0 1 3 T z d k x

Few Words on the Unitary Case As for the MMSE estimate, we get a smoothed shrinkage curve: ˆ MMSE k c T p 1 c exp z d k 1 p c 1 p 1 c exp 1 p T z d k This leads to a dense representation vector, just like the one we have seen earlier, both in the deblurring result and in the synthetic experiment MMSE ˆ k 3 1 0-1 - p=0.1 =0.3 x =1-3 -3 - -1 0 1 3 T z d k

Relative Representation Mean-Squared-Error A Demo These results correspond to a unitary dictionary of size n=100. In this case, exact MAP & MMSE are accessible, and we have also formulae for the error Parameters: n,m: 100 100 p=0.1 σ x =1 5000 experiments 0.5 0. 0.15 0.1 0.05 Empirical Oracle Theoretical Oracle Empirical MMSE Theoretical MMSE Empirical MAP Theoretical MAP 0 0 0.5 1 1.5

MMSE: Back to Reality

The Main Lesson The main lesson to take from the above discussion is this: Even under the Sparseland model which assumes that our signals are built of sparse combination of atoms estimation of such signals is better done by using a dense representation

Implications This is a counter-intuitive statement, and yet it is critical to the use of Sparseland This may explain few of the results we saw earlier: In the deblurring experiment we got the best result for a very dense representation In the Thresholding-based denoising experiment, the best threshold led to a dense representation Warning: This does not mean that any dense representation is good!!

Merging MMSE Estimation in Our Algorithms The concept of the MMSE estimation can be added to various algorithms in order to boost their performance For example: o o In the deblurring task, one may seek several sparse explanations for the signal by Random- OMP and then average them. This is challenging due to the dimensions involved When denoising patches (e.g. the K-SVD algorithm), one could replace the OMP by a Random-OMP, and get a better denoising outcome. In practice, the benefit in this is low due to the later patch-averaging

Here is Another Manifestation of MMSE Consider the Following Rationale Observation: K-SVD denoising applied on portions of an image, leads to improved results Thus: seems that locally adaptive dictionary is beneficial At the extreme, use a different dictionary for each pixel Alternative: MMSE estimate, allowing dense combination of atoms A sparse representation in this case is bad Atoms are noisy Too hard to train - Use the surrounding patches as the atoms

The NLM Algorithm: The Practice ˆx k 0 k where w k,k w k,k Comments: w k,k k exp R 0 k R 0 k z z R 0 h Once the cleaned patches are created, they are averaged as they overlap. Note that the original NLM simply uses the center pixel, and thus averaging is not done Our interpretation of NLM is very far from the original idea the authors had in mind k 0 z

Summary Sparsity and Redundancy are used for denoising of signals/images How? Estimation theory tells us exactly what should be done (or approximate) All this provides an interesting explanation to earlier observations Conclusions? Averaging leads to better denoising, as it approximates the MMSE So can we do better than OMP?