Compressive Sensing Richard Baraniuk Rice University Acknowledgements For assistance preparing this presentation Rice DSP group Petros Boufounos, Volkan Cevher Mark Davenport, Marco Duarte, Chinmay Hegde, Jason Laska, Shri Sarvotham, Mike Wakin, University of Michigan geometry of CS, embeddings Justin Romberg, Georgia Tech optimization algorithms
Agenda Introduction to Compressive Sensing (CS) motivation basic concepts CS Theoretical Foundation geometry of sparse and compressible signals coded acquisition restricted isometry property (RIP) signal recovery CS Applications Related concepts and work Pressure is on Digital Sensors Success of digital data acquisition is placing increasing pressure on signal/image processing hardware and software to support higher resolution / denser sampling» ADCs, cameras, imaging systems, microarrays, x large numbers of sensors» image data bases, camera arrays, distributed wireless sensor networks, x increasing numbers of modalities» acoustic, RF, visual, IR, UV, x-ray, gamma ray,
Pressure is on Digital Sensors Success of digital data acquisition is placing increasing pressure on signal/image processing hardware and software to support higher resolution / denser sampling» ADCs, cameras, imaging systems, microarrays, x large numbers of sensors» image data bases, camera arrays, distributed wireless sensor networks, x increasing numbers of modalities» acoustic, RF, visual, IR, UV deluge of data»how to acquire, store, fuse, process efficiently? Digital Data Acquisition Foundation: Shannon sampling theorem if you sample densely enough (at the Nyquist rate), you can perfectly reconstruct the original analog data time space
Sensing by Sampling Long-established paradigm for digital data acquisition uniformly sample data at Nyquist rate (2x Fourier bandwidth) sample too much data! Sensing by Sampling Long-established paradigm for digital data acquisition uniformly sample data at Nyquist rate (2x Fourier bandwidth) compress data sample compress transmit/store JPEG JPEG2000 receive decompress
Sparsity / Compressibility pixels large wavelet coefficients (blue = 0) wideband signal samples frequency large Gabor (TF) coefficients time Sample / Compress Long-established paradigm for digital data acquisition uniformly sample data at Nyquist rate compress data sample compress transmit/store sparse / compressible wavelet transform receive decompress
What s Wrong with this Picture? Why go to all the work to acquire N samples only to discard all but K pieces of data? sample compress transmit/store sparse / compressible wavelet transform receive decompress What s Wrong with this Picture? linear processing linear signal model (bandlimited subspace) nonlinear processing nonlinear signal model (union of subspaces) sample compress transmit/store sparse / compressible wavelet transform receive decompress
Compressive Sensing Directly acquire compressed data Replace samples by more general measurements compressive sensing transmit/store receive reconstruct Compressive Sensing Theory I Geometrical Perspective
Sampling Signal is -sparse in basis/dictionary WLOG assume sparse in space domain sparse signal nonzero entries Compressive Sampling When data is sparse/compressible, can directly acquire a condensed representation with no/little information loss through linear dimensionality reduction measurements sparse signal nonzero entries
How Can It Work? Projection not full rank and so loses information in general Ex: Infinitely many s map to the same How Can It Work? Projection not full rank and so loses information in general columns But we are only interested in sparse vectors is effectively MxK
How Can It Work? Projection not full rank and so loses information in general columns But we are only interested in sparse vectors Design so that each of its MxK submatrices are full rank How Can It Work? Goal: Design so that its Mx2K submatrices are full rank columns difference between two K-sparse vectors is 2K sparse in general preserve information in K-sparse signals Restricted Isometry Property (RIP) of order 2K
Unfortunately columns Goal: Design so that its Mx2K submatrices are full rank (Restricted Isometry Property RIP) Unfortunately, a combinatorial, NP-complete design problem Insight from the 80 s [Kashin, Gluskin] Draw at random iid Gaussian iid Bernoulli columns Then has the RIP with high probability Mx2K submatrices are full rank stable embedding for sparse signals extends to compressible signals in balls
Compressive Data Acquisition Measurements = random linear combinations of the entries of WHP does not distort structure of sparse signals no information loss measurements sparse signal nonzero entries CS Signal Recovery Goal: Recover signal from measurements Problem: Random projection not full rank (ill-posed inverse problem) Solution: Exploit the sparse/compressible geometry of acquired signal
Concise Signal Structure Sparse signal: only K out of N coordinates nonzero model: union of K-dimensional subspaces aligned w/ coordinate axes sorted index Concise Signal Structure Sparse signal: only K out of N coordinates nonzero model: union of K-dimensional subspaces Compressible signal: sorted coordinates decay rapidly to zero power-law decay sorted index
Concise Signal Structure Sparse signal: only K out of N coordinates nonzero model: union of K-dimensional subspaces Compressible signal: sorted coordinates decay rapidly to zero model: ball: power-law decay sorted index Random projection not full rank CS Signal Recovery Recovery problem: given find Null space So search in null space for the best according to some criterion ex: least squares (N-M)-dim hyperplane at random angle
CS Signal Recovery Recovery: given (ill-posed inverse problem) find (sparse) fast, wrong pseudoinverse Why Doesn t Work for signals sparse in the space/time domain least squares, minimum solution is almost never sparse null space of translated to (random angle)
CS Signal Recovery Reconstruction/decoding: (ill-posed inverse problem) given find fast, wrong number of nonzero entries find sparsest in translated nullspace CS Signal Recovery Reconstruction/decoding: (ill-posed inverse problem) given find fast, wrong correct: only M=2K measurements required to stably reconstruct K-sparse signal slow: NP-complete algorithm number of nonzero entries
CS Signal Recovery Recovery: given (ill-posed inverse problem) find (sparse) fast, wrong correct, slow correct, efficient mild oversampling [Candes, Romberg, Tao; Donoho] linear program number of measurements required Why Works for signals sparse in the space/time domain minimum solution = sparsest solution (with high probability) if
Universality Random measurements can be used for signals sparse in any basis Universality Random measurements can be used for signals sparse in any basis
Universality Random measurements can be used for signals sparse in any basis sparse coefficient vector nonzero entries Compressive Sensing Directly acquire compressed data Replace N samples by M random projections random measurements transmit/store receive linear pgm
Compressive Sensing Theory II Stable Embedding Johnson-Lindenstrauss Lemma JL Lemma: random projection stably embeds a cloud of Q points whp provided Q points Proved via concentration inequality Same techniques link JLL to RIP [B, Davenport, DeVore, Wakin, Constructive Approximation, 2008]
Connecting JL to RIP Consider effect of random JL on each K-plane construct covering of points Q on unit sphere JL: isometry for each point with high probability union bound isometry for all points q in Q extend to isometry for all x in K-plane K-plane Connecting JL to RIP Consider effect of random JL on each K-plane construct covering of points Q on unit sphere JL: isometry for each point with high probability union bound isometry for all points q in Q extend to isometry for all x in K-plane union bound isometry for all K-planes K-planes
Gaussian Favorable JL Distributions Bernoulli/Rademacher [Achlioptas] Database-friendly [Achlioptas] Random Orthoprojection to R M [Gupta, Dasgupta] RIP as a Stable Embedding RIP of order 2K implies: for all K-sparse x 1 and x 2, K-planes
Connecting JL to RIP [with R. DeVore, M. Davenport, R. Baraniuk] Theorem: Supposing is drawn from a JL-favorable distribution,* then with probability at least 1-e -C*M, meets the RIP with * Gaussian/Bernoulli/database-friendly/orthoprojector Bonus: universality (repeat argument for any ) See also Mendelson et al. concerning subgaussian ensembles Compressive Sensing Recovery Algorithms
CS Recovery Algorithms Convex optimization: Linear programming BPDN GPSR FPC Bregman iteration Dantzig selector, [with or without noise] Iterative greedy algorithm Matching Pursuit (MP) Orthogonal Matching Pursuit (OMP) StOMP ROMP CoSaMP, Compressive Sensing Theory Summary
CS Hallmarks CS changes the rules of the data acquisition game exploits a priori signal sparsity information Stable acquisition/recovery process is numerically stable Universal same random projections / hardware can be used for any compressible signal class (generic) Asymmetrical (most processing at decoder) conventional: smart encoder, dumb decoder CS: dumb encoder, smart decoder Random projections weakly encrypted CS Hallmarks Democratic each measurement carries the same amount of information robust to measurement loss and quantization simple encoding Ex: wireless streaming application with data loss conventional: complicated (unequal) error protection of compressed data DCT/wavelet low frequency coefficients CS: merely stream additional measurements and reconstruct using those that arrive safely (fountain-like)
Compressive Sensing In Action Art Gerhard Richter 4096 Farben / 4096 Colours 1974 254 cm X 254 cm Laquer on Canvas Catalogue Raisonné: 359 Museum Collection: Staatliche Kunstsammlungen Dresden (on loan) Sales history: 11 May 2004 Christie's New York Post-War and Contemporary Art (Evening Sale), Lot 34 US$3,703,500
Gerhard Richter Cologne Cathedral Stained Glass Compressive Sensing In Action Cameras
Single-Pixel CS Camera scene single photon detector random pattern on DMD array DMD DMD image reconstruction or processing w/ Kevin Kelly Single-Pixel CS Camera scene single photon detector random pattern on DMD array DMD DMD image reconstruction or processing Flip mirror array M times to acquire M measurements Sparsity-based (linear programming) recovery
Single Pixel Camera Object LED (light source) Photodiode circuit Lens 2 Lens 1 DMD+ALP Board First Image Acquisition target 65536 pixels 11000 measurements (16%) 1300 measurements (2%)
Second Image Acquisition 4096 pixels 500 random measurements Single-Pixel Camera single photon detector random pattern on DMD array DMD DMD image reconstruction or processing
CS Low-Light Imaging with PMT true color low-light imaging 256 x 256 image with 10:1 compression [Nature Photonics, April 2007] Hyperspectral Imaging spectrometer blue red near IR
Video Acquisition Measure via stream of random projections shutterless camera Reconstruct using sparse model for video structure 3-D wavelets (localized in space-time) space time 3D complex wavelet original 64x64x64 3-D wavelet thresholding 2000 coeffs, MSE = 3.5 frame-by-frame 2-D CS recon 20000 coeffs, MSE = 18.4 joint 3-D CS recon 20000 coeffs, MSE = 3.2
Miniature DMD-based Cameras TI DLP picoprojector destined for cell phones Georgia Tech Analog Imager Bottleneck in imager arrays is data readout Instead of quantizing pixel values, take CS inner products in analog Potential for tremendous (factor of 10000) power savings [Hassler, Anderson, Romberg, et al]
Other Imaging Apps Compressive camera arrays sampling rate scales logarithmically in both number of pixels and number of cameras Compressive radar and sonar simplified receiver exploring radar/sonar networks Compressive DNA microarrays smaller, more agile arrays for bio-sensing exploit sparsity in presentation of organisms to array [O. Milenkovic et al] Compressive Sensing In Action A/D Converters
Analog-to-Digital Conversion Nyquist rate limits reach of today s ADCs Moore s Law for ADCs: technology Figure of Merit incorporating sampling rate and dynamic range doubles every 6-8 years DARPA Analog-to-Information (A2I) program wideband signals have high Nyquist rate but are often sparse/compressible develop new ADC technologies to exploit new tradeoffs among Nyquist rate, sampling rate, dynamic range, frequency frequency hopper spectrogram time Analog-to-Information Conversion Sample near signal s (low) information rate rather than its (high) Nyquist rate A2I sampling rate number of tones / window Nyquist bandwidth Practical hardware: randomized demodulator (CDMA receiver)
Example: Frequency Hopper Nyquist rate sampling 20x sub-nyquist sampling spectrogram sparsogram Analog-to-Information Conversion analog signal A2I digital measurements DSP information statistics Analog-to-information (A2I) converter takes analog input signal and creates discrete (digital) measurements Extension of analog-to-digital converter that samples at the signal s information rate rather than its Nyquist rate
Streaming Measurements Streaming applications: cannot fit entire signal into a processing buffer at one time streaming requires special measurements Random Demodulator
Random Demodulator 0.5 0 0.5 50 100 150 200 250 300 350 400 450 Random Demodulator
Parallel Architecture y 1 [m] p 1 (t) mt x(t) p 2 (t) mt y 2 [m] y Z [m] p Z (t) mt [Candes, Romberg, Wakin, et al] Random Demodulator Theorem [Tropp et al 2007] If the sampling rate satisfies then locally Fourier K-sparse signals can be recovered exactly with probability
Empirical Results Example: Frequency Hopper Random demodulator AIC at 8x sub-nyquist spectrogram sparsogram
Compressive Sensing In Action Data Processing Information Scalability Many applications involve signal inference and not reconstruction detection < classification < estimation < reconstruction fairly computationally intense
Information Scalability Many applications involve signal inference and not reconstruction detection < classification < estimation < reconstruction Good news: CS supports efficient learning, inference, processing directly on compressive measurements Random projections ~ sufficient statistics for signals with concise geometrical structure Matched Filter Detection/classification with K unknown articulation parameters Ex: position and pose of a vehicle in an image Ex: time delay of a radar signal return Matched filter: joint parameter estimation and detection/classification compute sufficient statistic for each potential target and articulation compare best statistics to detect/classify
Matched Filter Geometry Detection/classification with K unknown articulation parameters data Images are points in Classify by finding closest target template to data for each class (AWG noise) distance or inner product target templates from generative model or training data (points) Matched Filter Geometry Detection/classification with K unknown articulation parameters data Images are points in Classify by finding closest target template to data As template articulation parameter changes, points map out a K-dim nonlinear manifold Matched filter classification = closest manifold search articulation parameter space
CS for Manifolds Theorem: random measurements stably embed manifold whp [B, Wakin, FOCM 08] related work: [Indyk and Naor, Agarwal et al., Dasgupta and Freund] Stable embedding Proved via concentration inequality arguments (JLL/CS relation) Theorem: CS for Manifolds random measurements stably embed manifold whp Enables parameter estimation and MF detection/classification directly on compressive measurements K very small in many applications (# articulations)
Example: Matched Filter Detection/classification with K=3 unknown articulation parameters 1. horizontal translation 2. vertical translation 3. rotation Smashed Filter Detection/classification with K=3 unknown articulation parameters (manifold structure) Dimensionally reduced matched filter directly on compressive measurements
Smashed Filter Random shift and rotation (K=3 dim. manifold) Noise added to measurements Goal: identify most likely position for each image class identify most likely class using nearest-neighbor test avg. shift estimate error more noise classification rate (%) more noise number of measurements M number of measurements M Compressive Sensing (Theoretical Aside) CS for Manifolds
Random Projections Random projections preserve information Compressive Sensing (sparse signal embeddings) Johnson-Lindenstrauss lemma (point cloud embeddings) K-planes What about manifolds? Manifold Embeddings [Whitney (1936); Sauer et al. (1991)] K-dimensional, smooth, compact M 2K+1 linear measurements - necessary for injectivity (in general) - sufficient for injectivity (w.p. 1) when Gaussian But not enough for efficient, robust recovery
Recall: K-planes Embedding K-planes M 2K linear measurements - necessary for injectivity - sufficient for injectivity (w.p. 1) when Gaussian But not enough for efficient, robust recovery Theorem: Stable Manifold Embedding [with R. Baraniuk] Let F ½ R N be a compact K-dimensional manifold with condition number 1/ (curvature, self-avoiding) volume V Let be a random MxN orthoprojector with Then with probability at least 1-, the following statement holds: For every pair x 1,x 2 2 F,
Theorem: Stable Manifold Embedding [with R. Baraniuk] Let F ½ R N be a compact K-dimensional manifold with condition number 1/ (curvature, self-avoiding) volume V Let be a random MxN orthoprojector with Then with probability at least 1-, the following statement holds: For every pair x 1,x 2 2 F, Stable Manifold Embedding Sketch of proof: construct a sampling of points on manifold at fine resolution from local tangent spaces apply JL to these points extend to entire manifold Implications: Nonadaptive (even random) linear projections can efficiently capture & preserve structure of manifold See also: Indyk and Naor, Agarwal et al., Dasgupta and Freund
Manifold Learning Given training points in, learn the mapping to the underlying K-dimensional articulation manifold ISOMAP, LLE, HLLE, Ex:images of rotating teapot articulation space = circle Up/Down Left/Right Manifold K=2 [Tenenbaum, de Silva, Langford]
Compressive Manifold Learning ISOMAP algorithm based on geodesic distances between points Random measurements preserve these distances Theorem: If, then the ISOMAP residual variance in the projected domain is bounded by [Hegde et al 08] the additive error factor translating disk manifold (K=2) full data (N=4096) M = 100 M = 50 M = 25 Compressive Sensing In Action Networked Sensing
Multi-Signal CS Sensor networks: intra-sensor and inter-sensor correlation Can we exploit these to jointly compress? Popular approach: collaboration inter-sensor communication overhead Ongoing challenge in information theory Slepian-Wolf coding, One solution: Compressive Sensing Distributed CS (DCS) Measure separately, reconstruct jointly Ingredients models for joint sparsity algorithms for joint reconstruction theoretical results for measurement savings
Distributed CS (DCS) Measure separately, reconstruct jointly Ingredients models for joint sparsity algorithms for joint reconstruction theoretical results for measurement savings Zero collaboration, trivially scalable, robust Low complexity, universal encoding Real Data Example Light Sensing in Intel Berkeley Lab 49 sensors, N =1024 samples each, = wavelets K=100 M=400 M=400
Multisensor Fusion Network of J sensors viewing articulating object Complexity/bandwidth for fusion and inference naïve w/ no compression individual CS (sensor by sensor) Multisensor Fusion Network of J sensors viewing articulating object Collection of images lies on K-dim manifold in Complexity/bandwidth for fusion and inference naïve w/ no compression individual CS (sensor by sensor) joint CS (fuse all sensors)
Multi-Camera Vision Apps Background subtraction from multi-camera compressive video innovations (due to object motion) sparse in space domain estimate innovations from random projections of video only few measurements required per camera thanks to overlapping field of view Multi-Camera Vision Apps 3-D shape inference from compressive msmnts estimate innovations from random projections of video fuse into 3D shape by exploiting object sparsity in 3D volume
Compressive Sensing Related Work Related Work Streaming algorithms Multiband sampling Finite rate of innovation (FROI) sampling General sparse approximation
Compressive Sensing The End CS Summary Compressive sensing integrates sensing, compression, processing exploits signal sparsity/compressibility information enables new sensing modalities, architectures, systems Why CS works: stable embedding for signals with concise geometric structure sparse signals compressible signals manifolds Information scalability for compressive inference compressive measurements ~ sufficient statistics many fewer measurements may be required to detect/classify/estimate than to reconstruct
dsp.rice.edu/cs