Subspace Indexing on Grassmann Manifold for Large Scale Visual Recognition

Size: px

Start display at page:

Download "Subspace Indexing on Grassmann Manifold for Large Scale Visual Recognition"

Francis Atkins
5 years ago
Views:

1 Subspace Indexing on Grassmann Manifold for Large Scale Visual Recognition Zhu Li Univ of Missouri, Kansas City p. 1

2 Outline Short Intro Research Motivation and Background Current Research Work Subspace Indexing on Grassmann Manifold for Large Visual Recognition Problem Summary p. 2

Communication Video compression, QoE, new

3 About Myself Zhu Li Associate Professor Director, MC2 Lab Short Bio: PhD, ECE, Northwestern University, Research Experiences: Research Interests: Visual Computing Media Computing & Communication Lab Object Re-ID, Image Tagging, Point Cloud/Light Field Compression Visual Communication Video compression, QoE, new media transport Networking and Mobile Edge Computing algorithm & computing image understanding visual communication mobile edge computing & communication p. 3

4 Research Motivation 2016 p. 4

5 Media Computing & Communication in 2016 Devices Networks mmwave 5G 4k/8k UHD Video Free Viewpoint TV FD-MIMO D2D Immersive Visual Communication SDN/MEC Navigation & Object Re-Identification p. 5

6 Mobile Visual Search : Object Re-ID Mobile Query-by-Capture Object Recognition Pipeline KD: Keypoint Detection (Box Filtering ) Recall FS: Feature Selection (Affine transformed 2 way-matching) GD/LD: global and local descriptor generation (AKULA, Collision Optimized Hash) GD KD FS LD Descriptor Extraction Global Descriptor bitstream Local Descriptors Descriptor Encoding Precision p. 6

(27 DoF) 27 DoF, very complex Infra structured light + stereo camera HMD Eye Gaze Tracking

7 Next Gen Immersive Visual Communication Platform HMD Tracking (6 DoF) Track movement: dx, dy, dz Track angular movement RF+Visual based tracking Stereo Camera LED Gesture Tracking (27 DoF) 27 DoF, very complex Infra structured light + stereo camera HMD Eye Gaze Tracking (2 DoF) Some how easier, only 2DoF Multi-infra source illumination Purkenje image based tracking p. 7

Point Cloud Capture & Compression Free Viewpoint

photorealistic objects/human insertion in games

Challenges Plenoptic signal extremely large

at x100 ratio MPEG working group on PCC Octree

DFS octree nodes Our work: plane projection

8 Point Cloud Capture & Compression Free Viewpoint TV from 3d Point Cloud Capture VR applications: photorealistic objects/human insertion in games Free View Point: replay technology Compression Challenges Plenoptic signal extremely large volume Communication & storage need compression at x100 ratio MPEG working group on PCC Octree decomposition of the space, arithmetic coding of DFS octree nodes Our work: plane projection approximation, Graph Signal Processing Dynamic point cloud compression p. 8

Compact Knowledge Distilling for Embedded Devices Automatic

recognition: Distilling the knowledge into a compact form

set: >10k images per tag training set augmentation:» affine

9 Compact Knowledge Distilling for Embedded Devices Automatic Image Tagging: Mapping image pixel values to tags Device based recognition: Distilling the knowledge into a compact form Compact CNN model for knowledge distilling BIGDATA training set: >10k images per tag training set augmentation:» affine transforms, simulated lighting changes Pruning filters from the CNN structure p. 9

Robust Content Identification & De-Duplication Rate Agnostic Robust Video Fingerprinting Robust and scalable video fingerprint for duplicate and near duplicate search and mining applications (Funded

10 Robust Content Identification & De-Duplication Rate Agnostic Robust Video Fingerprinting Robust and scalable video fingerprint for duplicate and near duplicate search and mining applications (Funded by Microsoft Research grant, HK RGC GRF grant, Samsung DMC) Playback Verification: Differential feature for playback verification (ads enforcement, broadcast monitoring), 200 bps, very lightweight signature. Multi-Source Synchronization: synch multi-camera video and audio sources, content based approach Content Identification: content naming service for ICN, cache manifest and de-duplication. p. 10

11 Subspace Indexing on Grassmann Manifold for Large Scale Visual Recognition p. 11

belong to different classes are easily separable This kind of good

12 Appearance Modeling Finding a good f() Find a good f() Such that after projecting the appearance onto the subspace, the data points belong to different classes are easily separable This kind of good function is usually non-linear, hard to obtain from training data. p. 12

13 A Good Linear Model: f(x) = Ax LDA (R. A. Fisher 1936/P. Belhumeu 1997) : - Maximizing inter-class scatter over intra-class scatter - Intra and Inter scatter - Solve by a generalized Eigen problem:.. a1 a2 a3 a4 a5 LDA/Fisherface Basis p. 13

14 State of Art in Linear Models: Graph Embedding Projection LPP (X-. F. He and P. Niyogi, circa 2004): - Minimizing weighted distance (a graph) after projection -Solve by: - Embed a graph (as specified by wj,k) with pruned edges, via a projection A. - The training data is reflected by the graph p. 14

Common Graph Embedding Interpretation Both LPP and LDA are graph embedding linear projections: LDA:» preserve the affinity matrix that has zero affinity for

15 Common Graph Embedding Interpretation Both LPP and LDA are graph embedding linear projections: LDA:» preserve the affinity matrix that has zero affinity for data points pairs that are not belonging to the same class LPP:» Have more flexibility in modeling affinity wjk. LPP Affinity Graph LDA Affinity Graph p. 15

16 Non-Linear Models Non-Linear Solutions and challenges: Kernel method: e.g K-PCA, K-LDA, K-LPP, SVM» Evaluate inner product <xj,xk> with a kernel function k(xj, xk), which if satisfy the conditions in Mercer s Theorem, implicitly maps data via a non-linear function.» Typically involves a QP problem with a Hessian of size n x n, when n is large, not solvable. LLE /Graph Laplacian:» An algorithm that maps input data {xk} to {yk} that tries to preserve an embedded graph structure among data points.» The mapping is data dependent and has difficulty handling new data outside the training set, e.g., a new query point How to compromise? Piece-wise Linear Approximation p. 16

Piece-wise Linear : Query Driven Query-Driven Piece-wise Linear Model No pre-determined structure on the training data Local neighborhood data

17 Piece-wise Linear : Query Driven Query-Driven Piece-wise Linear Model No pre-determined structure on the training data Local neighborhood data patch identified from query point q, Local model built with local data, A(X, q) Local Graph Model: X A(X,q) q + Y=A(X,q)X Local data: N(X, q) + p. 17

18 DoF Criteria for Data Driven Local Model What is a good local data size N(X, q)? The DoF of a linear model: DoF(A) = w x h x d Example: DoF(A) = 20x24x5=2400 Discriminant Information to be captured: LDA: a graph with edges pruned for intra-class points LPP; pruned graph Information function n non-zeros as number of edges/relationship among data points n What is a good compromise of data complexity and model power? p. 18

19 Discriminant Power Co-efficient (DPC) DPC is computed as ratio of DoF and Information function of the local model overfitting Model validation accuracy as functions of DPC p. 19

20 Head Pose Recognition Performance Recognition rate is improved: DoF: w=18, h=18, d=16, 32 Local Data Size: K=30 And the cost in computation is rather modest Matlab code, online local model A(X,q) learning and NN classification: p. 20

Face Recognition Performance Local model combination in face recognition Query point drives 3 local models, A1(X, q), A2(X, q), A3(X, q) Local model classification error estimation, Combining the

21 Face Recognition Performance Local model combination in face recognition Query point drives 3 local models, A1(X, q), A2(X, q), A3(X, q) Local model classification error estimation, Combining the results weighted voting Multiple face models with different area and scale: (a) Upper face model (18 16). (b) Lower face model (14 18). (c) Full face model (21 28). ORL data set test: leave 2 out: p. 21

22 Query Driven Piece-Wise Linear Summary Solid gains over the state of art global linear models Allows for piece wise local linear approximation of the global non-linearity Reduced data point size n makes the kernel solution more computationally tractable. Computational/Storage Complexity Data driven: need to compute a run-time model that adds computational complexity Storage Penalty: need extra storage to store all training data, because the local NN data patch is generated at run time, as function of the query point. Indexing challenge: for very large scale problem, not practical to store all training data, and effective indexing needed to support efficient access of query induced local training data. Can we have a more compact way of distilling this intelligence? p. 22

23 Closer Examination of the DoF Stiefel manifold All possible p othonormal basis functions in d-dimensional space, Apxd, spans Stiefel Manifold, S(p, d) in Rdxp, d > p. It reequires othonormal relationship among basis vectors, so the DoF is not pd What is missing? a1 a2 p. 23

24 Model DoF on Grassmannian Manifolds Grassmann manifold G(p, d) identifies p-dimensional subspaces in d-dimensional space, it consists of p othonormal bases in d-diemnsional space (Stiefel Manifold) and an equivalence constraint on the rotation of the basis functions: G(p, d) is the quotient space of S(p, d)/o(p), i.e., A1=A2, The DoF of subspaces on Grassmann manifold a1 a2 p. 24

25 Visualizing Subspace Models DoF Consider a typical appearance modeling Image size 12x10 pixel, appearance space dimension d=120, model dimension p=8. 3D visualization of all S(8, 120) and their covariance eigenvalues Grassmann Manifolds are quotient space S(8, 120)/O(8) p. 25

26 Principal Angles The principal angles between two subspaces: For A1, and A2 in G(p, d), their principal angles are defined as Where, {uk} and {vk} are called principal dimensions for span(a1) and span(a2). p. 26

27 Principal Angles Computing The principle angles between two subspaces: Rotating A1, and A2 in G(p, d), such that they are maximally aligned solving by SVD: Where, U=[u1, u2,, up], and V=[v1, v2,, vp] are the principle angles. The diagonal of S, [s1, s2,..., sp] are the cosine of principal angles, and principal angles are computed as, p. 27

28 Subspace Metrics on Grassmann Manifold Subspace distance metrics: Projection Distance Def: Computing: Binet-Cauchy Distance Def: Computing: p. 28

29 Subspace Distance on Grassmannian Manifold Subspace distances Arc Distance Def: Also known as geodesic distance. It traverse the Grassmannian surface, and two subspace collapse into one, when all principal angles becomes zero. * A2 A1* p. 29

30 Combination of two subspaces on Grassmann Manifold What if we need merge two subspaces? Motivation:» say if subspace A1 is best for data set S1, and subspace A2 is best for data set S2, can we find a subspace A3 that is good for both? When two subspaces are sufficiently close on Grassmannian manifold, we can approximate this by, A3=[t1, t2,.] tk = n1 n2 uk + vk n1 + n2 n1 + n2 Where n1,2 are the size of data set S1,2 A1* + + * A2 The new sets of basis may not be orthogonal. Can be corrected by Gram-Schmidt orthogonalization. p. 30

31 Building Local Models from BIGDATA Training Set The Plan: Partition the (unlabeled) large training data set into local data patches Compute local models for each data patch with labels, and then optimize a subset for the recognition Data Space Partition Partition the training data set by kd-tree, for the kd-tree height of h, we have 2 h local data patch as leaf node For each leaf node data patch k, build a local LDA/LPP model Ak: p. 31

32 Subspace Indexing Subspace Clustering by Grassmann Metric: It is a VQ like process. Start with a data partition kd-tree, their leaf nodes and associated subspaces {A k}, k=1..2h Repeat» Find Ai and Aj, if darc(ai, Aj) is the smallest among all, and the associated data patch are adjacent in the data space.» Delete Ai and Aj, replace with merged new subspace, and update associated data patch leaf nodes set.» Compute the empirical identification accuracy for the merged subspace» Add parent pointer to the merged new subspace for Ai and Aj.» Stop if only 1 subspace left. Benefit:» avoid forced merging of subspace models at data patches that are very different, though adjacent. p. 32

MHT Based Identification MHT operation Organize the leaf nodes models into a new hierarchy, with new models and associated accuracy (error rate) estimation When a probe point comes, first identify

33 MHT Based Identification MHT operation Organize the leaf nodes models into a new hierarchy, with new models and associated accuracy (error rate) estimation When a probe point comes, first identify its leaf nodes from the data partition tree. Then traverse the MHT from leaf nodes up, until it hits the root, which is the global model, and choose the best model along the path for identification p. 33

34 Simulation The data set MSRA Multimedia data set, 65k images with class and relevance labels: p. 34

35 Simulation Data selection and features Selected 12 classes with 11k images and use the original combined 889d features from color, shape and texture Performance compared with PCA, LDA and LPP modeling p. 35

36 Simulation Face data set Mixed data set of 242 individuals, and 4840 face images Performance compared with PCA, LDA and LPP modeling p. 36

37 Summary Contributions Address the DoF issues of the BIGDATA recognition problem. Piece-wise linear approach is effective in modeling non-linearity in visual manifolds for a variety of recognition problem. Subspace indexing on Grassmann manifold offers a systemic approach in optimizing the linear models for the localized problem. Solid performance gains over the state of art global linear models and their kernelized non-linear models. p. 37

38 Publications & Future Work Future work Grassmann Hashing Penalize projection selection with Grassmannian metric, offers performance gains over LSH and spectral hashing. Grassmann Local Descriptor Aggregation (MPEG CDVS) significantly improved the indexing efficiency when work with Fisher Vector. Publications: Z. Li, A. Nagar, G. Srivastava, and K. Park, CDVS: GRASF - Grassmann Subspace Feature for Key Point Aggregation, ISO/IEC SC29/WG11/MPEG2014/m32536, San Jose, USA, X. Wang, Z. Li, and D. Tao, "Subspace Indexing on Grassmann Manifold for Image Search", IEEE Trans. on Image Processing, vol. 20(9), X. Wang, Z. Li, L. Zhang, and J. Yuan, "Grassmann Hashing for Approx Nearest Neighbour Search in High Dimensional Space", Proc. of IEEE Int'l Conf on Multimedia & Expo (ICME), Barcelona, Spain, H. Xu, J. Wang, Z. Li, G. Zeng, S. Li, Complementary Hashing for Approximate Nearest Neighbor Search, IEEE Int'l Conference on Computer Vision (ICCV), Barcelona, Spain, Yun Fu, Z. Li, J. Yuan, Ying Wu, and Thomas S. Huang, "Locality vs. Globality: Query-Driven Localized Linear Models for Facial Image Computing," IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT), vol. 18(12), pp , December, p. 38

My Student:» Xinchao Wang, valedictorian of Dept of COMP, HK

39 Acknowledgment Grants: the work is partially supported by; a Hong Kong RGC Grant, and Microsoft Research Asia faculty grant. My Student:» Xinchao Wang, valedictorian of Dept of COMP, HK Polytechnic University, class 2010, completed PhD at EPFL (Pascal Fua), now at UIUC IFP. p. 39

40 Q&A Thanks! p. 40

Image Analysis & Retrieval. CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W Lec 16

Image Analysis & Retrieval. CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W Lec 16 Image Analysis & Retrieval CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W 4-5:15pm@Bloch 0012 Lec 16 Subspace/Transform Optimization Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: