Compressed Sensing David L Donoho Presented by: Nitesh Shroff University of Maryland
Outline 1 Introduction Compressed Sensing 2 Problem Formulation Sparse Signal Problem Statement 3 Proposed Solution Near Optimal Information Near Optimal Reconstruction 4 Applications Example 5 Conclusion
Compressed Sensing Motivation Why go to so much effort to acquire all the data when most of the what we get will be thrown away?
Compressed Sensing Motivation Why go to so much effort to acquire all the data when most of the what we get will be thrown away? Cant we just directly measure the part that wont end up being thrown away?
Compressed Sensing Motivation Why go to so much effort to acquire all the data when most of the what we get will be thrown away? Cant we just directly measure the part that wont end up being thrown away? Wouldnt it be possible to acquire the data in already compressed form so that one does not need to throw away anything?
Outline 1 Introduction Compressed Sensing 2 Problem Formulation Sparse Signal Problem Statement 3 Proposed Solution Near Optimal Information Near Optimal Reconstruction 4 Applications Example 5 Conclusion
Sparse Signal Sparse Signal A signal x is K sparse if its support is i : x i 0 is of cardinality less or equal to K.
Sparse Signal Sparse Signal A signal x is K sparse if its support is i : x i 0 is of cardinality less or equal to K. Figure: 1 Sparse [richb]
Sparse Signal Sparse Signal Figure: 2 Sparse [richb]
Sparse Signal Sparse Signal Figure: 2 Sparse [richb] Figure: 3 Sparse [richb]
Sparse Signal Sparse Signal Have few non-zero coefficients.
Sparse Signal Sparse Signal Have few non-zero coefficients. Sparsity leads to efficient estimators.
Sparse Signal Sparse Signal Have few non-zero coefficients. Sparsity leads to efficient estimators. Sparsity leads to dimensionality reduction.
Sparse Signal Sparse Signal Have few non-zero coefficients. Sparsity leads to efficient estimators. Sparsity leads to dimensionality reduction. Signals of interest are often sparse or compressible.
Sparse Signal Sparse Signal Have few non-zero coefficients. Sparsity leads to efficient estimators. Sparsity leads to dimensionality reduction. Signals of interest are often sparse or compressible. Known property of standard images and many other digital content types.
Sparse Signal Sparse Signal Have few non-zero coefficients. Sparsity leads to efficient estimators. Sparsity leads to dimensionality reduction. Signals of interest are often sparse or compressible. Known property of standard images and many other digital content types. Transform Compression: The object has transform coefficients θ i =< x,ψ i > and are assumed sparse in the sense that, for some 0 < p < 2 and for some R > 0: θ p = ( i θ i p ) 1/p R
Problem Statement Compressive data acquisition When data is sparse/compressible, can directly acquire a condensed representation with no/little information loss through dimensionality reduction y = Φx
Problem Statement Compressive data acquisition When data is sparse/compressible, can directly acquire a condensed representation with no/little information loss through dimensionality reduction y = Φx Directly acquire compressed data
Problem Statement Compressive data acquisition When data is sparse/compressible, can directly acquire a condensed representation with no/little information loss through dimensionality reduction y = Φx Directly acquire compressed data n = O( mlog(m)) measurements instead of m measurements.
Problem Statement Problem Statement x R m
Problem Statement Problem Statement x R m Class X of such signals
Problem Statement Problem Statement x R m Class X of such signals Design an information operator I n : X R n that samples n pieces of information about x
Problem Statement Problem Statement x R m Class X of such signals Design an information operator I n : X R n that samples n pieces of information about x Design an Algorithm A : R n R m that offers an approximate reconstruction of x.
Problem Statement Problem Statement x R m Class X of such signals Design an information operator I n : X R n that samples n pieces of information about x Design an Algorithm A : R n R m that offers an approximate reconstruction of x. Here the information operator takes the form I n (x) = (< ξ 1,x >,...,< ξ n,x >) where the ξ i are sampling kernels
Problem Statement Problem Statement Nonadaptive sampling, i.e. fixed independently of x.
Problem Statement Problem Statement Nonadaptive sampling, i.e. fixed independently of x. A n is an unspecified, possibly nonlinear reconstruction operator.
Problem Statement Problem Statement Nonadaptive sampling, i.e. fixed independently of x. A n is an unspecified, possibly nonlinear reconstruction operator. Interested in l 2 error of reconstruction E n (X) = inf An,I A n sup x X x A n (I n (x)) 2
Problem Statement Problem Statement Nonadaptive sampling, i.e. fixed independently of x. A n is an unspecified, possibly nonlinear reconstruction operator. Interested in l 2 error of reconstruction Why not adaptive: For 0 < p < 1 E n (X) = inf An,I A n sup x X x A n (I n (x)) 2 E n (X p,m (R)) 2 1/p E n (Xp,m Adapt (R)) Adaptive infromation is of minimal help.
Outline 1 Introduction Compressed Sensing 2 Problem Formulation Sparse Signal Problem Statement 3 Proposed Solution Near Optimal Information Near Optimal Reconstruction 4 Applications Example 5 Conclusion
Near Optimal Information Information Operator I n Ψ orthogonal matrix of basis elements ψ i
Near Optimal Information Information Operator I n Ψ orthogonal matrix of basis elements ψ i I n = ΦΨ T
Near Optimal Information Information Operator I n Ψ orthogonal matrix of basis elements ψ i I n = ΦΨ T Assume Ψ to be identity for simplicity
Near Optimal Information Information Operator I n Ψ orthogonal matrix of basis elements ψ i I n = ΦΨ T Assume Ψ to be identity for simplicity J {1,2,...,m}
Near Optimal Information Information Operator I n Ψ orthogonal matrix of basis elements ψ i I n = ΦΨ T Assume Ψ to be identity for simplicity J {1,2,...,m} Φ J submatrix of Φ with columns J.
Near Optimal Information Information Operator I n Ψ orthogonal matrix of basis elements ψ i I n = ΦΨ T Assume Ψ to be identity for simplicity J {1,2,...,m} Φ J submatrix of Φ with columns J. V J denote range of Φ in R n
Near Optimal Information CS conditions Structural conditions on an n m matrix which imply that its nullspace is optimal.
Near Optimal Information CS conditions Structural conditions on an n m matrix which imply that its nullspace is optimal. CS1: Minimal singular value of Φ J exceeds η 1 > 0 uniformly in J < ρn/log(m)
Near Optimal Information CS conditions Structural conditions on an n m matrix which imply that its nullspace is optimal. CS1: Minimal singular value of Φ J exceeds η 1 > 0 uniformly in J < ρn/log(m) This demands a certain quantitative degree of linear independence among all small groups of columns.
Near Optimal Information CS conditions Structural conditions on an n m matrix which imply that its nullspace is optimal. CS1: Minimal singular value of Φ J exceeds η 1 > 0 uniformly in J < ρn/log(m) This demands a certain quantitative degree of linear independence among all small groups of columns. CS2: On each subspace V J we have the inequality uniformly in J < ρn/log(m) v 1 η 2. n. v 2 v V J
Near Optimal Information CS conditions Structural conditions on an n m matrix which imply that its nullspace is optimal. CS1: Minimal singular value of Φ J exceeds η 1 > 0 uniformly in J < ρn/log(m) This demands a certain quantitative degree of linear independence among all small groups of columns. CS2: On each subspace V J we have the inequality uniformly in J < ρn/log(m) v 1 η 2. n. v 2 v V J This says that linear combinations of small groups of columns give vectors that look much like random noise, at least as far as the comparison of l 1 and l 2 norms is concerned.
Near Optimal Information CS conditions Structural conditions on an n m matrix which imply that its nullspace is optimal. CS1: Minimal singular value of Φ J exceeds η 1 > 0 uniformly in J < ρn/log(m) This demands a certain quantitative degree of linear independence among all small groups of columns. CS2: On each subspace V J we have the inequality uniformly in J < ρn/log(m) v 1 η 2. n. v 2 v V J This says that linear combinations of small groups of columns give vectors that look much like random noise, at least as far as the comparison of l 1 and l 2 norms is concerned. Every V J slices through the l 1 m in such a way that the resulting convex section is actually close to spherical.
Near Optimal Information CS conditions Family of quotient norms on R n. Q J c(v) = min θ l 1 (J c ) subject to Φ J cθ = v These describe the minimal l 1 norm representation of v achievable using only specified subsets of columns of Φ
Near Optimal Information CS conditions Family of quotient norms on R n. Q J c(v) = min θ l 1 (J c ) subject to Φ J cθ = v These describe the minimal l 1 norm representation of v achievable using only specified subsets of columns of Φ CS3: On each subspace V J Q J c(v) η 3 / log(m/n) v 1,v V J uniformly in J < ρn/log(m)
Near Optimal Information CS conditions Family of quotient norms on R n. Q J c(v) = min θ l 1 (J c ) subject to Φ J cθ = v These describe the minimal l 1 norm representation of v achievable using only specified subsets of columns of Φ CS3: On each subspace V J Q J c(v) η 3 / log(m/n) v 1,v V J uniformly in J < ρn/log(m) This says that for every vector in some V J, the associated quotient norm Q Jc is never dramatically better than the simple l 1 norm on (R) n.
Near Optimal Information CS conditions Family of quotient norms on R n. Q J c(v) = min θ l 1 (J c ) subject to Φ J cθ = v These describe the minimal l 1 norm representation of v achievable using only specified subsets of columns of Φ CS3: On each subspace V J Q J c(v) η 3 / log(m/n) v 1,v V J uniformly in J < ρn/log(m) This says that for every vector in some V J, the associated quotient norm Q Jc is never dramatically better than the simple l 1 norm on (R) n. Matrices satisfying these conditions are ubiquitous for large n and m.
Near Optimal Information CS conditions Family of quotient norms on R n. Q J c(v) = min θ l 1 (J c ) subject to Φ J cθ = v These describe the minimal l 1 norm representation of v achievable using only specified subsets of columns of Φ CS3: On each subspace V J Q J c(v) η 3 / log(m/n) v 1,v V J uniformly in J < ρn/log(m) This says that for every vector in some V J, the associated quotient norm Q Jc is never dramatically better than the simple l 1 norm on (R) n. Matrices satisfying these conditions are ubiquitous for large n and m. Possible to choose η i and ρ independent of n and of m
Near Optimal Information Ubiquity Theorem: Let (n,m n ) be a sequence of problem sizes with n, n < m n, and m A.n γ, A > 0 and γ 1. There exist η i > 0 and ρ > 0 depending only on A and γ so that, for each δ > 0 the proportion of all n m matrices Φ satisfying CS1-CS3 with parameters (η i ) and ρ eventually exceeds 1 δ.
Near Optimal Information Φ matrix Random Projection Φ
Near Optimal Information Φ matrix Random Projection Φ Not full rank n m where n < m
Near Optimal Information Φ matrix Random Projection Φ Not full rank n m where n < m Loses information, in general
Near Optimal Information Φ matrix Random Projection Φ Not full rank n m where n < m Loses information, in general But preserves structure and information in sparse signals with high probability.
Near Optimal Information Φ matrix Random Projection Φ Not full rank n m where n < m Loses information, in general But preserves structure and information in sparse signals with high probability. Generated by randomly sampling the columns, with different columns iid random uniform on S n 1
Near Optimal Reconstruction Reconstruction Algorithm Least norm solution min θ(x) p subject to y n = I n (x) (P p )
Near Optimal Reconstruction Reconstruction Algorithm Least norm solution Drawbacks: min θ(x) p subject to y n = I n (x) (P p )
Near Optimal Reconstruction Reconstruction Algorithm Least norm solution Drawbacks: p should be known min θ(x) p subject to y n = I n (x) (P p )
Near Optimal Reconstruction Reconstruction Algorithm Least norm solution min θ(x) p subject to y n = I n (x) (P p ) Drawbacks: p should be known p < 1 is a nonconvex optimization problem.
Near Optimal Reconstruction Reconstruction Algorithm Least norm solution min θ(x) p subject to y n = I n (x) (P p ) Drawbacks: p should be known p < 1 is a nonconvex optimization problem. Solving l 0 norm requires combinatorial optimization.
Near Optimal Reconstruction Basis Pursuit Let A n 2m = [Φ Φ]
Near Optimal Reconstruction Basis Pursuit Let A n 2m = [Φ Φ] Linear Program min z 1 T z subject to Az = y n, x 0.
Near Optimal Reconstruction Basis Pursuit Let A n 2m = [Φ Φ] Linear Program min z 1 T z subject to Az = y n, x 0. has solution z = [u v ]
Near Optimal Reconstruction Basis Pursuit Let A n 2m = [Φ Φ] Linear Program min z 1 T z subject to Az = y n, x 0. has solution z = [u v ] θ = u v
Near Optimal Reconstruction Basis Pursuit Fix ɛ > 0
Near Optimal Reconstruction Basis Pursuit Fix ɛ > 0 x such that θ 1 cm α, α < 1/2.
Near Optimal Reconstruction Basis Pursuit Fix ɛ > 0 x such that θ 1 cm α, α < 1/2. Make n C ɛ.m 2α log(m) measurements.
Near Optimal Reconstruction Basis Pursuit Fix ɛ > 0 x such that θ 1 cm α, α < 1/2. Make n C ɛ.m 2α log(m) measurements. x ˆx 1,θ 2 << ɛ. x 2
Near Optimal Reconstruction Reconstruction Algorithm When P 0 has a solution, P 1 will find it.
Near Optimal Reconstruction Reconstruction Algorithm When P 0 has a solution, P 1 will find it. Theorem: Suppose that Φ satisfies CS1-CS3 with given positive constants ρ, (η i ). There is a constant ρ 0 > 0 depending only on ρ and (η i ) and not on n or m so that, if θ has at most ρ 0 n/log(m) nonzeros, then (P 0 ) and (P 1 ) both have the same unique solution.
Near Optimal Reconstruction Reconstruction Algorithm When P 0 has a solution, P 1 will find it. Theorem: Suppose that Φ satisfies CS1-CS3 with given positive constants ρ, (η i ). There is a constant ρ 0 > 0 depending only on ρ and (η i ) and not on n or m so that, if θ has at most ρ 0 n/log(m) nonzeros, then (P 0 ) and (P 1 ) both have the same unique solution. i.e., Although the system of equations is massively undetermined, l 1 minimization and sparse solution conincide when the result is sufficiently sparse.
Outline 1 Introduction Compressed Sensing 2 Problem Formulation Sparse Signal Problem Statement 3 Proposed Solution Near Optimal Information Near Optimal Reconstruction 4 Applications Example 5 Conclusion
Example Parial Fourier Ensemble Collection of n m matrices made by sampling n rows out of the m m Fourier matrix,
Example Parial Fourier Ensemble Collection of n m matrices made by sampling n rows out of the m m Fourier matrix, Concrete examples of Φ working within the CS framework
Example Parial Fourier Ensemble Collection of n m matrices made by sampling n rows out of the m m Fourier matrix, Concrete examples of Φ working within the CS framework Allows l 1 minimization to reconstruct from such information for all 0 < p < 1
Example Images of Bounded Variation f (x), x [0,1] 2.
Example Images of Bounded Variation f (x), x [0,1] 2. Bounded in absolute value f B.
Example Images of Bounded Variation f (x), x [0,1] 2. Bounded in absolute value f B. The wavelet coefficients follow θ (j) C.B.2 j
Example Images of Bounded Variation f (x), x [0,1] 2. Bounded in absolute value f B. The wavelet coefficients follow θ (j) C.B.2 j Fix finest scale j 1
Example Images of Bounded Variation f (x), x [0,1] 2. Bounded in absolute value f B. The wavelet coefficients follow θ (j) Fix finest scale j 1 Take j 0 = j 1 /2 C.B.2 j
Example Images of Bounded Variation f (x), x [0,1] 2. Bounded in absolute value f B. The wavelet coefficients follow θ (j) Fix finest scale j 1 Take j 0 = j 1 /2 C.B.2 j CS takes 4 j 0 resume coefficients and n c4 j 0 (log) 2 (4 j 1 ) samples for detail coefficients.
Example Images of Bounded Variation f (x), x [0,1] 2. Bounded in absolute value f B. The wavelet coefficients follow θ (j) Fix finest scale j 1 Take j 0 = j 1 /2 C.B.2 j CS takes 4 j 0 resume coefficients and n c4 j 0 (log) 2 (4 j 1 ) samples for detail coefficients. Total of c.j 2 1.4j 1/2 pieces of measured information.
Example Images of Bounded Variation f (x), x [0,1] 2. Bounded in absolute value f B. The wavelet coefficients follow θ (j) Fix finest scale j 1 Take j 0 = j 1 /2 C.B.2 j CS takes 4 j 0 resume coefficients and n c4 j 0 (log) 2 (4 j 1 ) samples for detail coefficients. Total of c.j 2 1.4j 1/2 pieces of measured information. Error of the same order of linear sampling with 4 j 1 samples: with c independent of f. f ˆf c2 j 1
Example Piecewise C 2 Images with C 2 edges C 2,2 (B,L) of piecewise smooth f (x), x [0,1] 2.
Example Piecewise C 2 Images with C 2 edges C 2,2 (B,L) of piecewise smooth f (x), x [0,1] 2. Bounded in absolute value, first and second partial derivative by B.
Example Piecewise C 2 Images with C 2 edges C 2,2 (B,L) of piecewise smooth f (x), x [0,1] 2. Bounded in absolute value, first and second partial derivative by B. Curves total length L.
Example Piecewise C 2 Images with C 2 edges C 2,2 (B,L) of piecewise smooth f (x), x [0,1] 2. Bounded in absolute value, first and second partial derivative by B. Curves total length L. Fix finest scale j 1
Example Piecewise C 2 Images with C 2 edges C 2,2 (B,L) of piecewise smooth f (x), x [0,1] 2. Bounded in absolute value, first and second partial derivative by B. Curves total length L. Fix finest scale j 1 Take j 0 = j 1 /4
Example Piecewise C 2 Images with C 2 edges C 2,2 (B,L) of piecewise smooth f (x), x [0,1] 2. Bounded in absolute value, first and second partial derivative by B. Curves total length L. Fix finest scale j 1 Take j 0 = j 1 /4 CS takes 4 j 0 resume coefficients and n c4 j 0 (log) 5/2 (4 j 1 ) samples for detail coefficients.
Example Piecewise C 2 Images with C 2 edges C 2,2 (B,L) of piecewise smooth f (x), x [0,1] 2. Bounded in absolute value, first and second partial derivative by B. Curves total length L. Fix finest scale j 1 Take j 0 = j 1 /4 CS takes 4 j 0 resume coefficients and n c4 j 0 (log) 5/2 (4 j 1 ) samples for detail coefficients. Total of c.j 5/2 1.4 j 1/4 pieces of measured information.
Example Piecewise C 2 Images with C 2 edges C 2,2 (B,L) of piecewise smooth f (x), x [0,1] 2. Bounded in absolute value, first and second partial derivative by B. Curves total length L. Fix finest scale j 1 Take j 0 = j 1 /4 CS takes 4 j 0 resume coefficients and n c4 j 0 (log) 5/2 (4 j 1 ) samples for detail coefficients. Total of c.j 5/2 1.4 j 1/4 pieces of measured information. Error of the same order of linear sampling with 4 j 1 samples: with c independent of f. f ˆf c2 j 1
Outline 1 Introduction Compressed Sensing 2 Problem Formulation Sparse Signal Problem Statement 3 Proposed Solution Near Optimal Information Near Optimal Reconstruction 4 Applications Example 5 Conclusion
Conclusion Framework for compressed sensing of objects x R m
Conclusion Framework for compressed sensing of objects x R m Conditions on Φ CS1-CS3
Conclusion Framework for compressed sensing of objects x R m Conditions on Φ CS1-CS3 Information Operator I n (x) = ΦΨ T x
Conclusion Framework for compressed sensing of objects x R m Conditions on Φ CS1-CS3 Information Operator I n (x) = ΦΨ T x Random sampling yields near optimal I n with overwhelmingly high probability.
Conclusion Framework for compressed sensing of objects x R m Conditions on Φ CS1-CS3 Information Operator I n (x) = ΦΨ T x Random sampling yields near optimal I n with overwhelmingly high probability. Reconstructed x by solving l 1 convex optimization.
Conclusion Framework for compressed sensing of objects x R m Conditions on Φ CS1-CS3 Information Operator I n (x) = ΦΨ T x Random sampling yields near optimal I n with overwhelmingly high probability. Reconstructed x by solving l 1 convex optimization. Exploits a priori signal sparsity information. [richb]
Thank You!