A Parallel Algorithm for Compressive Sensing

Size: px

Start display at page:

Download "A Parallel Algorithm for Compressive Sensing"

Spencer Kelly
6 years ago
Views:

1 A Parallel Algorithm for ompressive Sensing Alexandre Borghi Laboratoire de Recherche en Informatique Universite Paris-Sud 11 July A Parallel Algorithm for ompressive Sensing 1

signal = most of coefficients are 0 (in some basis). Measurements need to be taken in an appropriate space.

2 Idea of ompressive Sensing ompressive Sensing idea [andès et al. 2004]: under some sparsity assumptions one can exactly reconstruct a signal with few measurements Sparse signal = most of coefficients are 0 (in some basis). Measurements need to be taken in an appropriate space. Example: sparse gradient/measurements in Fourier domain Frequencies kept: 22 lines passing through the zero frequency A Parallel Algorithm for ompressive Sensing 2

3 ompressive Sensing [andes, Romberg, Tao,... ] Wish to minimize { min u J(u) s.t. Au = f with J being l 1 (or the Total Variation) where A is a compressive sensing matrix, and f the observed data In this part of the talk, we consider a penalization approach (LASSO) E µ (u) = J(u) + µ 2 Au f 2 2 Target: minimize E µ ( ) many available algorithms! mathematical design data structures/algorithm implementation efficiency depends on the available computing technologies parallel computing capacities that are specific to each processor not all algorithms can take benefit from available features A Parallel Algorithm for ompressive Sensing 3

4 l 1 -ompressive Sensing Sparsity minimizing l 1 -norm. (S) { min u u 1 s.t. Au = f where u IR n is the signal to reconstruct f IR m is the observed data A IR m n is a ompressive Sensing matrix m n A Parallel Algorithm for ompressive Sensing 4

5 Outline 1 What kind of parallel computing technologies are widely available? 2 Design of an approached l 1 -compressive sensing algorithm (sparse signal) A Parallel Algorithm for ompressive Sensing 5

6 Parallel Many-ore Architectures: PU oeur 1 oeur 2 oeur 3 oeur 4 oeur 1 oeur 2 oeur 3 oeur 4 N1 N1 N1 N1 N1+N2 N1+N2 N1+N2 N1+N2 N2 partagé N2 partagé N3 partagé FSB QPI Figure: Intel PU (left: ore 2 Quad; right: ore i7). A Parallel Algorithm for ompressive Sensing 6

7 Parallel Many-ore Architectures: ell SPE SPE SPE SPE Locale Locale Locale Locale N1 PPE EIB N2 Locale Locale Locale Locale SPE SPE SPE SPE Figure: ell. A Parallel Algorithm for ompressive Sensing 7

8 Parallel Many-ore Architectures: GPU L o c a l e N1 partagé PI Express L o c a l e Mémoire L o c a l e N1 partagé L o c a l e Mémoire Mémoire Mémoire L o c a l e N1 partagé PI Express L o c a l e L o c a l e... L o c a l e N1 partagé L o c a l e L o c a l e... Mémoire Mémoire Mémoire Mémoire Mémoire Mémoire Figure: NVIDIA GPU (left : G80; right : G200). A Parallel Algorithm for ompressive Sensing 8

9 Parallel Many-ore Architectures Restriction to Parallel Many-ore Architectures (widely available) Video card: NVidia GPU (> 96 cores) P: Intel processors multi-core (> 4 ores) Playstation 3 video game console ell (6-8 cores) cheap! Specific characteristics: GFLOPS, energy consumption, price... ommon features: Parallel: composed of several computing units Shared memory: central memory accessible from computing units System of cache hierarchy to improve data transfer A Parallel Algorithm for ompressive Sensing 9

10 Features and Requirements oarse parallelism: Several computational units Multi-core, GPU, ell synchronization issue Vectorization (SIMD): Process the data as a vector alignment issue: data must be accessed at specific addresses By far, the most important feature (because several SIMD units/core) Memory bandwidth: Amount of data that can be transfered data are transferred back and forth from the memory to the processor Starvation issue: a computing unit is waiting for the data By far, the most important issue A compiler (or Matlab) tries to do that automatically Performance highly depends on a successful code optimization Design algorithms implementations meet these requirements A Parallel Algorithm for ompressive Sensing 10

11 l 1 -compressive sensing 1 What kind of parallel computing technologies are widely available? 2 Design of an approached l 1 -compressive sensing algorithm (sparse signal) Moreau-Yosida Regularization/Proximal Points Proximal operator choices Algorithm Experiments and implementation A Parallel Algorithm for ompressive Sensing 11

12 Moreau-Yosida Regularization Recall, we wish to minimize E µ (u) = J(u) + µ 2 Au f 2 2 Moreau-Yosida Regularization [Moreau 65, Yosida 66]: Given a metric M and any current point u (k) { F µ (u (k) ) = inf E µ (u) + 1 } u IR n 2 u u(k) 2 M. This defines the proximal point p µ (u (k) ) and is characterized by: ( 0 E µ + 1 ) 2 u(k) 2 M (p µ (u (k) )) That is p µ (u (k)) = (M + E µ ) 1 ( Mu (k)). A Parallel Algorithm for ompressive Sensing 12

13 Proximal Operator hoices Standard convergence result [Lemarechal 97, Rockafellar 76,... ]: Iterating the proximal point generates a sequence that converges toward a minimizer of E µ How to choose M for parallel computations? Rewriting (M + E µ ) 1, we have J(u (k+1) ) + (µa t A + M)u (k+1) = µa t f + Mu (k). We wish µa t A + M diagonal problem of the form l 1 + g 2 2 So we take M = (1 + ɛ)µid µa t A A Parallel Algorithm for ompressive Sensing 13

14 Proximal Operator hoices With such a metric Solution is given by a shrinkage u (k+1) = 1 (1 + ɛ)µ M = (1 + ɛ)µid µa t A, { µa t f + Mu (k) sg ( µa t f + Mu (k)) if µa t f + Mu (k) > 1 0 otherwise, Shrinkage has been very well studied [hambolle, Daubechies, Elad, Figuereido, Yin,... ] and used in many efficient l 1 S algorithms Shrinkage is easily and efficiently implemented on many-cores Need to perform Au and A t Au in parallel A is a fast transform good (not great) parallel and vectorized version A is an explicit matrix great vectorized version but badly parallel (bandwith issue) A Parallel Algorithm for ompressive Sensing 14

15 Algorithm Recall, we wish to minimize (with µ large) E µ (u) = J(u) + µ 2 Au f 2 2 How to choose initial µ too small shrinkage yields 0 too big slow convergence choose of a good one by a bitonic search Algorithm: 1 Start with u = 0 2 ompute the initial value of µ using the above bitonic approach 3 For k=0 to l max Iterate the proximal point until some convergence criteria are met Au k f 2 2 f 2 2 < tolerance A Parallel Algorithm for ompressive Sensing 15

16 Exact solution from approximate solution 1 ompute the basis matrix B associated with u (k+1) by choosing the columns of A corresponding to the nonzero elements of u (k+1) 2 ompute v = B 1 f 3 Return u (k+2) in base A from v in base B by adding zeros where necessary A Parallel Algorithm for ompressive Sensing 16

17 Experiments and Implementation Implementation: shrinkage is parallel, vectorizable Au is either implemented as an explicit matrix multiplication or a Fast Transform (e.g., DT) explicit multiplication huge amount of data to transfer Experimental conditions Multi-core: Intel ore 2 Quad Q6600, 76 GFLOPS (theoretical) Multi-core: Intel ore i7 920, 86 GFLOPS (theoretical) ell: Sony Playstation 3, 159 GFLOPS (theoretical) GPU: NVidia 8800GTS, 345 GFLOPS (theoretical) GPU: NVidia GTX 275, 1010 GFLOPS (theoretical) A Parallel Algorithm for ompressive Sensing 17

18 Experiments: Errors Tolerance Ground truth Tolerance Ground truth Number of iterations A Parallel Algorithm for ompressive Sensing 18

19 Experiments: DT Wall clock time (s) ore 2 Quad (4 threads) 840 ore i7 (4 threads) 780 PS3 ell 720 NVIDIA 8800 GTS GPU 660 NVIDIA GTX 275 GPU K 2K 4K 8K 16K 32K K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1M Signal size (n) A Parallel Algorithm for ompressive Sensing 19

20 Experiments: Orthogonalized Gaussian matrices For Intel ore i7 920 (2.66 GHz) Signal of size n with m observations and 0.1m spikes Relative error u found u true 2 u true 2 in out time (s) m n relative #iter 1 th. 2 th. 4 th. 8 th e e e e e e ompare to Bregman based approach [Yin, Osher, Goldfarb,Darbon 08] between 5/10 times faster. A Parallel Algorithm for ompressive Sensing 20

21 Experiments: Approximate versus Exact resolution Figure: A Parallel Algorithm omparaison for ompressiventre Sensing les temps d exécution de notre méthode approchée21 Temps (s) Approx (1 thread) Approx (4 threads) Exact (1 thread) Exact (4 threads) K 2K K 2K 4K 8K 16K n

22 Experiments: Versus Matching Pursuit Figure: A Parallel Algorithm omparaison for ompressive GPU Sensing entre notre méthode (Prox) et l algorithme Matching22 Temps (s) Prox MP K 2K 4K 5 1K 2K 4K 8K 16K n

23 onclusion So far: Use a proximal operator (to assure convergence) Design the operator to allow parallel programming The compiler will perform the parallel optimization for you On Moreau-Yosida/Proximal Point Huge literature gave birth to splitting/preconditioning General theoretical properties: convergence, robustness... [Rockafellar 98] or [Hiriart-Urriuty-Lemarechal 94] Many similar approaches using shrinkage (linearization for instance) using other Proximal points More in [Borghi, Darbon, Peyronnet, han, Osher 08] (ULA am report) Use similar idea for J(u) = u l 1 A Parallel Algorithm for ompressive Sensing 23

Compressed Sensing and L 1 -Related Minimization

Compressed Sensing and L 1 -Related Minimization Yin Wotao Computational and Applied Mathematics Rice University Jan 4, 2008 Chinese Academy of Sciences Inst. Comp. Math The Problems of Interest Unconstrained