Multiple-View Object Recognition in Band-Limited Distributed Camera Networks

Size: px

Start display at page:

Download "Multiple-View Object Recognition in Band-Limited Distributed Camera Networks"

Daniella Patterson
6 years ago
Views:

1 in Band-Limited Distributed Camera Networks Allen Y. Yang, Subhransu Maji, Mario Christoudas, Kirak Hong, Posu Yan Trevor Darrell, Jitendra Malik, and Shankar Sastry Fusion, 2009

2 Classical Object Recognition Affine invariant features, SIFT.

3 Classical Object Recognition Affine invariant features, SIFT. SIFT Feature Matching [Lowe 1999, van Gool 2004] (a) Autostitch (b) Recognition

4 Classical Object Recognition Affine invariant features, SIFT. SIFT Feature Matching [Lowe 1999, van Gool 2004] (a) Autostitch (b) Recognition Bag of Words [Nister 2006]

5 SIFT Feature Coding in Sensor Networks In band-limited camera networks 1 Compress scalable SIFT tree [Girod et al. 2009] Observation 1: Tree histogram can be fully reconstructed from leaf nodes. Observation 2: Leaf node histogram is largely sparse (up to dim) R : Sequence of consecutive zero bins. S : Sequence of nonzero bin values.

Observation 2: Leaf node histogram is largely sparse (up to 10 6 -dim) R : Sequence of consecutive

6 SIFT Feature Coding in Sensor Networks In band-limited camera networks 1 Compress scalable SIFT tree [Girod et al. 2009] Observation 1: Tree histogram can be fully reconstructed from leaf nodes. Observation 2: Leaf node histogram is largely sparse (up to dim) R : Sequence of consecutive zero bins. S : Sequence of nonzero bin values. 2 Multiple-view SIFT feature selection [Darrell et al. 2008]

7 Problem Statement 1 L camera sensors observe a single object in 3-D.

8 Problem Statement 1 L camera sensors observe a single object in 3-D. 2 The mutual information between cameras are unknown, cross-sensor communication is prohibited.

9 Problem Statement 1 L camera sensors observe a single object in 3-D. 2 The mutual information between cameras are unknown, cross-sensor communication is prohibited. 3 On each camera, construct an encoding function for a nonnegative, sparse histogram x i f : x i R D y i R d

10 Problem Statement 1 L camera sensors observe a single object in 3-D. 2 The mutual information between cameras are unknown, cross-sensor communication is prohibited. 3 On each camera, construct an encoding function for a nonnegative, sparse histogram x i f : x i R D y i R d 4 On the base station, upon receiving y 1, y 2,, y L, simultaneously recover and classify the object class in space. x 1, x 2,, x L,

11 Key Observations (a) Histogram 1 (b) Histogram 2 All histograms are nonnegative and sparse.

12 Key Observations (a) Histogram 1 (b) Histogram 2 All histograms are nonnegative and sparse. Multiple-view histograms share joint sparse patterns.

13 Key Observations (a) Histogram 1 (b) Histogram 2 All histograms are nonnegative and sparse. Multiple-view histograms share joint sparse patterns. Classification is based on pairwise similarity measure in l 2 -norm (linear kernel) or l 1 -norm (intersection kernel).

14 Random Projection as Encoding Function y = Ax Coefficients of A R D d are drawn from zero-mean Gaussian distribution.

15 Random Projection as Encoding Function y = Ax Coefficients of A R D d are drawn from zero-mean Gaussian distribution. Johnson-Lindenstrauss Lemma For n number of point cloud in R D, given distortion threshold ɛ, for any d > O(ɛ 2 log n), a Gaussian random projection f (x) = Ax R d preserves pairwise l 2 -distance (1 ɛ) x i x j 2 2 f (x i ) f (x j ) 2 2 (1 + ɛ) x i x j 2 2.

16 Classification in Random Projection Space Projection only applies to leaf-node histogram x 4 (a) Level 1 3 (b) Level 4 (leaf nodes) x T = [x (1) R, x (2) R 10, x (3) R 100, x (4) R 1000 ].

17 Classification in Random Projection Space Projection only applies to leaf-node histogram x 4 (a) Level 1 3 (b) Level 4 (leaf nodes) x T = [x (1) R, x (2) R 10, x (3) R 100, x (4) R 1000 ]. Direct classification can be applied using projected leaf histogram (NN or SVM) y = Ax (4).

18 Classification in Random Projection Space Projection only applies to leaf-node histogram x 4 (a) Level 1 3 (b) Level 4 (leaf nodes) x T = [x (1) R, x (2) R 10, x (3) R 100, x (4) R 1000 ]. Direct classification can be applied using projected leaf histogram (NN or SVM) y = Ax (4). Advantages about Random Projection 1 Easy to generate and update. 2 Does not need training prior (universal dimensionality reduction). 3 faster recognition speed.

19 Experiment I: COIL-100 object database Database: 100 objects, each provides 72 images captured with 5 degree difference.

20 Experiment I: COIL-100 object database Database: 100 objects, each provides 72 images captured with 5 degree difference. SIFT features: Dense sampling of overlapping 8 8 grids. Standard SIFT descriptor. 4-level hierarchical k-means (k = 10): Leaf-node histogram is 1000-D.

21 Experiment I: COIL-100 object database Database: 100 objects, each provides 72 images captured with 5 degree difference. SIFT features: Dense sampling of overlapping 8 8 grids. Standard SIFT descriptor. 4-level hierarchical k-means (k = 10): Leaf-node histogram is 1000-D. Setup: For each object class, randomly select 10 image for training. Classifier via linear SVM.

22 From J-L Lemma to Compressive Sensing (a) J-L lemma (b) Compressive sensing 1 Problem I: J-L lemma does not provide means to reconstruct full hierarchy tree.

23 From J-L Lemma to Compressive Sensing (a) J-L lemma (b) Compressive sensing 1 Problem I: J-L lemma does not provide means to reconstruct full hierarchy tree. 2 Problem II: Gaussian projection does not preserve l 1 -distance (for intersection kernels).

24 From J-L Lemma to Compressive Sensing (a) J-L lemma (b) Compressive sensing 1 Problem I: J-L lemma does not provide means to reconstruct full hierarchy tree. 2 Problem II: Gaussian projection does not preserve l 1 -distance (for intersection kernels). 3 Problem III: Previous codec s impose explicit mutual information between fixed camera locations.

25 From J-L Lemma to Compressive Sensing (a) J-L lemma (b) Compressive sensing 1 Problem I: J-L lemma does not provide means to reconstruct full hierarchy tree. 2 Problem II: Gaussian projection does not preserve l 1 -distance (for intersection kernels). 3 Problem III: Previous codec s impose explicit mutual information between fixed camera locations. Compressive sensing provides principled solutions to the above problems.

26 Compressive Sensing Noise-free case: Assume x 0 is sufficiently k-sparse. Given triplet (D, d, k) and mild condition for A, (P 1 ) : min x 1 subject to y = Ax recovers the exact solution.

27 Compressive Sensing Noise-free case: Assume x 0 is sufficiently k-sparse. Given triplet (D, d, k) and mild condition for A, (P 1 ) : min x 1 subject to y = Ax recovers the exact solution. Noisy case: Assume x 0 is sufficiently k-sparse and bounded noise e 2 ɛ: y = Ax 0 + e. A quadratic program recovers a bounded near solution: x x 0 2 < Cɛ: (P 1 ) : min x 1 subject to y Ax 2 ɛ

28 Matching Pursuit 1 Initialization: y = [A; A] x, where x 0 k 0; x 0; r 0 y; Sparse support I = a3 a1 y a2 a3 a2 a1

29 Matching Pursuit 1 Initialization: y y = [A; A] x, where x 0 a1 a2 k 0; x 0; r 0 y; Sparse support I = a3 a3 2 k k + 1: i = arg max j I {a T j r k 1 } x1a1 a1 a2 r 1 y a2 a1 Update: I = I {i}; x i = a T i r k 1 ; r k = r k 1 x i a i a3 a3 a2 a1

30 Matching Pursuit 1 Initialization: y y = [A; A] x, where x 0 a1 a2 k 0; x 0; r 0 y; Sparse support I = a3 a3 2 k k + 1: i = arg max j I {a T j r k 1 } x1a1 a1 a2 r 1 y a2 a1 Update: I = I {i}; x i = a T i r k 1 ; r k = r k 1 x i a i a3 a3 3 If: r k 2 > ɛ, go to STEP 2; Else: output x x1a1 a1 a2 r 1 y a1 a2 Fail to search sparse solution on the boundary of the quotient polytope. x3a3 a3 a3 a2 a1

31 Polytope Faces Pursuit Dual linear program [Chen et al. 1998, Plumbley 2006] min 1 T x max y T c y=ã x Ã T c 1

32 Polytope Faces Pursuit Dual linear program [Chen et al. 1998, Plumbley 2006] min 1 T x max y T c y=ã x Ã T c 1 y a1 a2 a + 1 a + 2 a3 c+,., a + 3 c,.,+ a3 c,,. a2 a1 Definition: a + i. = a i a i 2 2 A vertex of the polar polytope at the intersection of hyperplanes a + 1 and a+ 2 : c,,.. = [ a1, a 2 ] T 1

33 Simulation 1 Initialization r 0 a1 a2 a3 c 0

34 Simulation 1 Initialization 2 Find face: A I = {a 1 }. r 0 x1a1 r 1 a1 a2 a2 a3 c 1 a3 c 0

35 Simulation 1 Initialization 2 Find face: A I = {a 1 }. r 0 x1a1 r 1 a1 a2 a2 a3 c 1 a3 c 0 3 Pursuit on the hyperplane x1a1 r 1 a2 c 1 a3

36 Simulation 1 Initialization 2 Find face: A I = {a 1 }. r 0 x1a1 r 1 a1 a2 a2 a3 c 1 a3 c 0 3 Pursuit on the hyperplane 4 Find face: A I = {a 1, a 2 }. r 1 x1a1 a2 a2 c 1 a3 x1a1 c 2 x2a2 a3

37 PFP: Algorithm 1 Convert y = Ax to 2 Initialization: k 0; x 0; r 0 y; Sparse support I = c 0 = 0 3 k k + 1: I = I {i}. 4 Update: x I = (ÃI ) y; r k = y Ã x c k = ((A I ) ) T 1 y = [A; A] x, where x 0. i = arg min j I {α at j (c k 1 + αr k 1 ) = 1} 5 If: x I contains negative coefficients, remove indexes from I, go to STEP 4. 6 If: r k 2 > ɛ, go to STEP 3; Else: output x

38 Distributed Object Recognition in Smart Camera Networks Outlines: 1 How to enforce nonnegativity in SIFT histograms?

39 Distributed Object Recognition in Smart Camera Networks Outlines: 1 How to enforce nonnegativity in SIFT histograms? 2 How to enforce joint sparsity across multiple camera views?

40 Enforcing Nonnegativity One advantage of PFP is that enforcing nonnegativity is trivial: 1 Algebraically: Do not add antipodal vertexes 2 Geometrically: Pursuit on positive faces y = [A; -A ] x a2 x1a1 c 2 x2a2 a3

41 Sparse Innovation Model Definition (SIM): x 1 = x + z 1,. x L = x + z L. x is called the joint sparse component, and z i is called an innovation.

42 Sparse Innovation Model Definition (SIM): x 1 = x + z 1,. x L = x + z L. x is called the joint sparse component, and z i is called an innovation. Joint recovery of SIM y 1. = y L y = A x R dl. A 1 A A L 0 0 A L x z 1.. z L

43 Sparse Innovation Model Definition (SIM): x 1 = x + z 1,. x L = x + z L. x is called the joint sparse component, and z i is called an innovation. Joint recovery of SIM y 1. = y L y = A x R dl. A 1 A A L 0 0 A L x z 1.. z L 1 New histogram vector is nonnegative and sparse.

44 Sparse Innovation Model Definition (SIM): x 1 = x + z 1,. x L = x + z L. x is called the joint sparse component, and z i is called an innovation. Joint recovery of SIM y 1. = y L y = A x R dl. A 1 A A L 0 0 A L x z 1.. z L 1 New histogram vector is nonnegative and sparse. 2 Joint sparsity x is automatically determined by l 1 -min: No prior training, no assumption about fixing cameras.

45 Simulation 1 Comparison between orthogonal matching pursuit, polytope faces pursuit, and sparse innovation model: Table: Simulation of solving 1000-D sparse histograms with d = 200, k = 60, and L = 3. Sparsity (60,0) (40,20) (30,30) l 0 OMP l 2 OMP l 0 PFP l 2 MMV l 0 SIM l 2 SIM

46 CITRIC: Wireless Smart Camera Platform CITRIC platform Available library functions 1 Full support Intel IPP Library and OpenCV. 2 JPEG compression: 10 fps. 3 Edge detector: 3 fps. 4 Background Subtraction: 5 fps.

47 CITRIC: Wireless Smart Camera Platform CITRIC platform Available library functions 1 Full support Intel IPP Library and OpenCV. 2 JPEG compression: 10 fps. 3 Edge detector: 3 fps. 4 Background Subtraction: 5 fps. Speed-Up Robust Features (SURF) on CITRIC

48 Experiment II: Recognition on COIL-100

49 Distributed Object Recognition in Band-Limited Smart Camera Networks 1 To harness the smart camera capacity, the system is separated in two components: distributed feature extraction and centralized recognition.

50 Distributed Object Recognition in Band-Limited Smart Camera Networks 1 To harness the smart camera capacity, the system is separated in two components: distributed feature extraction and centralized recognition. 2 Gaussian random projection as universal dimensionality reduction function: J-L lemma.

51 Distributed Object Recognition in Band-Limited Smart Camera Networks 1 To harness the smart camera capacity, the system is separated in two components: distributed feature extraction and centralized recognition. 2 Gaussian random projection as universal dimensionality reduction function: J-L lemma. 3 Polytope faces pursuit exploits two properties of general SIFT histograms: Sparsity. Nonnegativity.

52 Distributed Object Recognition in Band-Limited Smart Camera Networks 1 To harness the smart camera capacity, the system is separated in two components: distributed feature extraction and centralized recognition. 2 Gaussian random projection as universal dimensionality reduction function: J-L lemma. 3 Polytope faces pursuit exploits two properties of general SIFT histograms: Sparsity. Nonnegativity. 4 Sparse innovation model exploits joint sparsity of multiple-view object histograms.

53 Distributed Object Recognition in Band-Limited Smart Camera Networks 1 To harness the smart camera capacity, the system is separated in two components: distributed feature extraction and centralized recognition. 2 Gaussian random projection as universal dimensionality reduction function: J-L lemma. 3 Polytope faces pursuit exploits two properties of general SIFT histograms: Sparsity. Nonnegativity. 4 Sparse innovation model exploits joint sparsity of multiple-view object histograms. 5 Complete system implemented on Berkeley CITRIC sensors. References Distributed Compression and Fusion of Nonnegative Sparse Signals for. Information Fusion, in Band-Limited Distributed Camera Networks. Submitted to ICDSC, 2009.

Multiple-View Object Recognition in Band-Limited Distributed Camera Networks

Multiple-View Object Recognition in Band-Limited Distributed Camera Networks Allen Y Yang, Subhransu Maji, C Mario Christoudias, Trevor Darrell, Jitendra Malik, and S Shankar Sastry Department of EECS,