Graph-based Techniques for Searching Large-Scale Noisy Multimedia Data

Size: px

Start display at page:

Download "Graph-based Techniques for Searching Large-Scale Noisy Multimedia Data"

Samantha Shepherd
5 years ago
Views:

1 Graph-based Techniques for Searching Large-Scale Noisy Multimedia Data Shih-Fu Chang Department of Electrical Engineering Department of Computer Science Columbia University Joint work with Jun Wang (IBM), Tony Jebara (Columbia U), Wei Liu, Junfeng He, and Yu-Gang Jiang (Fudan U) 1

Graph-based Semi-Supervised Learning Given a small set of labeled data and a large number of unlabeled data in a high-dimensional feature space Build sparse graphs with local connectivity Propagate

2 Graph-based Semi-Supervised Learning Given a small set of labeled data and a large number of unlabeled data in a high-dimensional feature space Build sparse graphs with local connectivity Propagate information over graphs of large data sets Hopefully robust to noise and scalable to gigantic sets Input samples with sparse labels Label propagation on graph Label inference results Positive Negative Unlabeled Positive Negative 2

3 Intuition Capture local structures via sparse graph Nonlinear Classifier Graph Semi- Supervised Linear Learning Classifier Through Spare Graph Construction (e.g., knn)

4 Possible Applications: Propagating Labels in Interactive Search & Auto Re ranking Image\Video data Processing (denoising, cropping ) Feature Extraction Compute Similarity No predefined Category Graph Construction Existing Ranking/filtering System Top-rank results Interactive browse / label User Interface Applications Search, Browsing Re-ranking over large set Label Propagation S.-F. Chang, Columbia U. Automatic Mode 4 Interactive Mode

5 Example: Web Search Reranking Google Search Statue of Liberty Keyword Search Web Search Top images as + Bottom imgs as Label Diagnosis Diffusion

6 Application: Web Search Reranking Rerank Keyword Search Web Images Top images as + Bottom imgs as Label Diagnosis Diffusion

7 Application: Web Search Reranking Google Search Tiger Keyword Search Web Images Top images as + Bottom imgs as Label Diagnosis Diffusion

8 Application: Web Search Reranking Rerank Keyword Search Web Images Top images as + Bottom imgs as Label Diagnosis Diffusion How to Handle Noisy Labels before Propagation? Scalability?

9 Background Review Given a dataset of labeled samples, and unlabeled samples undirected graph of samples as vertices and edges weighted by sample similarity Define weight matrix ; vertex degree

10 Example Weight matrix Node degree D ??? Label matrix classes samples Graph-based SSL F Label prediction

11 Some Options of Constructing Sparse Graph Distance Threshold K-Nearest Neighbor Graph max 1 and B-Matched Graph (Huang and Jebara, AISTATS 2007) (Jebara, Wang, and Chang, ICML 2009) max

12 Several Ways of Constructing Sparse Graphs k,b=4 k,b=6 Distance threshold Rank threshold (knn) B-Match

13 Examples of Graph Construction (KNN) (B-Matching) k = 4 b = 4

14 Graph Construction Edge Weighting Binary Weighting Gaussian Kernel Weighting Locally Linear Reconstruction Weighting

15 Measure Smoothness: Graph Laplacian Graph Laplacian, and normalized Laplacian smoothness of function f over graph, Multi-class

16 Classical Methods: (Zhu et al ICML03, Zhou et al NIPS04, Joachim ICML03) Predict a graph function (F) via cost optimization prediction function function smoothness empirical loss Local and Global Consistency - LGC (Zhou et al, NIPS 04) Gaussian Random Fields GRF (Zhu et al, ICML03) 0

17 Empirical Observations (Jebara, Wang, and Chang, ICML 2009) Compare method-graphs-weights B-matching tends to outperform knn B-Matching particularly good for GTAM + local linear (LLR) weight GTAM GTAM GTAM GTAM GTAM GTAM 17

18 Noisy Label and other Challenges LGC Propagation GRF Propagation Unbalanced Labels Ill Label Locations Noisy Data and Labels 18

19 Label Unbalance A Quick Fix Normalize labels within each class based on node degrees Example: classes samples Label matrix Node degree matrix

20 Dealing with Noisy Labels Graph Transduction via Alternate Minimization ( GTAM, Wang, Jebara, & Chang, ICML, 2008) ( LDST, Wang, Jiang, & Chang, CVPR, 2009) Change uni variate optimization to bi variate formulation:

21 Alternate Optimization First, given Y solve continuous valued Then, search optimal integer Y given F* Gradient decent search

22 Alternate Minimization for Label Tuning Example: = Q = Add label: Delete label: (3,1) (1,1) = Iteratively repeat the above procedure

23 Example Toy Data Consider adding label only Label propagation by GTAM Convergence procedure (non-monotonic due to discrete step size) Unlabeled Positive Negative

24 Initial Labels Label Diagnosis and Self Tuning ( LDST, Wang, Jian, & Chang, CVPR, 2009) Add label: Delete label: Iteration # 2 Iteration # 6 Decline of the cost function Q over iterations (with vs. without label tuning)

25 Application: Web Search Reranking Google Search Tiger Keyword Search Web Images Top images as + Bottom imgs as Label Diagnosis Diffusion

26 Application: Web Search Reranking Rerank Keyword Search Web Images Top images as + Bottom imgs as Label Diagnosis Diffusion

27 Figure 4. Example images of text search results from flickr.com. A total of nine text queries are used: dog, tiger, panda, bird, flower, airplane, forbidden city, statue of liberty, golden bridge.

28 Effects of Graph based reranking VisualRank: Jing & Baluja, 08

$Possible Applications: Propagating Labels in Interactive Search & Auto Re ranking Image\Video data Processing (denoising, cropping ) Feature Extraction Compute Similarity No$

29 Possible Applications: Propagating Labels in Interactive Search & Auto Re ranking Image\Video data Processing (denoising, cropping ) Feature Extraction Compute Similarity No predefined Category Graph Construction Interactive browse / label User Interface Applications Search, Browsing Label Propagation Interactive Mode S.-F. Chang, Columbia U. 29

30 Application: Brain Machine Interface for Image Retrieval -- denoise unreliable labels from brain signal decoding (joint work with Sajda et al, ACMMM 2009, J. of Neural Engineering, May 2011) Use EEG brain signals to detect target of interest Use image graph to tune & propagate information

31 The Paradigm Database (any target that may interest users) 31

32 The Paradigm Neural (EEG) decoder EEG-scores Database 32

33 The Paradigm Neural (EEG) decoder Exemplar labels (noisy) image features Graph-based Semi-Supervised Learning Database prediction score 33

34 The Paradigm Pre-triage Post-triage 34

35 The Paradigm Human inspects only a small sample set via BCI Machine filters out noise and retrieves targets from very large DB General: no predefined target models, no keyword High Throughput: neuro vision as bootstrap of fast computer vision Pre-triage Post-triage 35

36 The Neural Signatures of Recognition D. Linden, Neuroscientist, 2005, the Oddball Effect Standard Novel (P3a) Target Target (P3b) Novel time Novel Target Standard 36

37 Effect of graph-based reranking (BCI test) Top (noisy) results of Brain EEG signal detection Top results after graphbased label denoising & propagation P R curve significantly improved 37

(BCI VPM) Top 20 results of EEG (BCI VPM)

38 More Example Results Top 20 results of EEG detection Top 20 results of Hybrid System (BCI VPM) Top 20 results of EEG detection Top 20 results of Hybrid System (BCI VPM) 38

39 Graph over million points and more k-nn graph construction + label prediction infeasible for large-scale tasks Idea: AnchorGraph Regularization complexity: # anchors m << n (W. Liu, J. He, S.-F. Chang, ICML2010) time 14 x 1010 Time Complexity data size n

40 Active topic in research Large scale spectral analysis (Fergus et al, 09) Approximate solutions as linear combinations of a small number of eigenfucntions of graph Laplacian Elegant solutions with linear complexity But only applicable to ideal data distributions (separable uniform or Gaussian) Matrix approximation via Nyström (Zhang et al, 09) W = Complexity But may not be positive semidefinite > non convex

sparse local embedding Data-to-data similarity W = inner product in the

41 Idea: Build low-rank graph via anchors (Liu, He, Chang, ICML10) Use anchor points to abstract the graph structure Compute data-to-anchor similarity: sparse local embedding Data-to-data similarity W = inner product in the embedded space anchor points W18>0 x1 Z16 Z11 u6 u1 Z12 W14=0 u5 u2 u4 u3 x8 data points x4

42 Probabilistic Intuition Affinity between samples iand j, Wij = probability of two step Markov random walk, where = diag( ), m<< n AnchorGraph: sparse, positive semi-definite

43 AnchorGraph Regression Apply the same sparse embedding principle to labels The whole graph regularization process becomes low-rank Small matrix inversion Predicted function over graph = embedding matrix inferred labels on anchors

labels in the anchor space (~100 labels) label initial

44 Intuition: Anchor Graph SSL Use low rank ARG to infer optimal labels on anchors and samples Predict optimal labels in the anchor space (~100 labels) label initial mapping labels in out Propagate to original sample space (~million labels)

Performance -small data set USPS-Train: 7,291 images of digits, 10 classes, 10 samples per class AGR^0: K-means anchors and naïve Z AGR: K-means anchors and optimized Z Method Error

45 Performance -small data set USPS-Train: 7,291 images of digits, 10 classes, 10 samples per class AGR^0: K-means anchors and naïve Z AGR: K-means anchors and optimized Z Method Error Rate (%) Time (seconds) 1NN LGC with NN graph GFHF with NN graph AGR^ x speedup AGR accuracy comparable to analytical optimum

46 Large Data Set Evaluation 630,000 MNIST images over 10 classes, 100 labeled images only Conventional analytical solutions infeasible Among scalable solutions reduce error rates by 30% 50% Method Error Rate (%) Training Time (seconds) 1NN Eigenfunction ( 09) PVM ( 09) AGR^ AGR %-50% gain

47 Extension to Web-Scale Techniques described above not scalable to Web scale or dynamic data sets Cannot handle cases when n = ~ billions For dynamic data, updating graph is expensive Preferred: learn Inductive Models to handle novel dynamic data 47

48 Data Subsampling & Learn Inductive Model Web-scale database novel data point x anchors labels a predict x s label z(x) T f(x)=z(x)a one million data points subsampling data-to-anchor map z(x) x Anchor Graph Regularization Anchor Graph Construction seed labels anchor points 48

49 ARG over 80M Tiny Images + CIFAR 10 training images test images airplane automobile bird cat deer dog frog horse ship truck background

80Million Tiny Images Novel test sample 1Million samples (1% labels from CIFAR-10) ARG as inductive model Learn ARG Method 1NN Linear SVM Eigen Function 1K Anchors PVM 2k Anchors 1K Anchors AGR 2k

50 80Million Tiny Images Novel test sample 1Million samples (1% labels from CIFAR-10) ARG as inductive model Learn ARG Method 1NN Linear SVM Eigen Function 1K Anchors PVM 2k Anchors 1K Anchors AGR 2k Anchors Accuracy (%) 51.66± ± ± ± ± ± ± 0.28 Training Time (s) Test Time (s) 6.29e e e e e e e 4

51 Multi edge Graph Additional Issues Multiple relation edges between nodes Multi feature Graph Build graphs in multiple feature spaces Joint optimization Label tuning vs. Active Learning 51

52 Image Based Multi Edge Graph Liu et al, ACM Multimedia 2010 two images with the same tag dog, flower dog, bird one edge connecting the two regions sharing the tag, but not all How to propagate label over multiple edges? 52

Extension to Multi-Feature Graphs Feature 1 0.

1 0 50 100 150 200 How to handle noisy labels

2 0 0 20 40 60 80 100 Feature K How to handle

53 Extension to Multi-Feature Graphs Feature How to handle noisy labels in multiple graphs? Feature K How to handle noisy labels in multiple graphs? Graph 1 Label Propagation Graph K User Input Ranking list

54 Multi-graph SSL vs. single-graph Improve performance by 20%-80% Caltech 101 data set

55 References and Tools 1. X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi supervised learning using Gaussian fields and harmonic functions. ICML, D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Scholkopf. Learning with local and global consistency. NIPS, W. Liu, J. He, and S. F. Chang. Large graph construction for scalable semi supervised learning. ICML, Software: wliu/anchor Graph.zip. 4. W. Liu, J. Wang, S. Kumar, and S. F. Chang. Hashing with graphs. ICML, J. Wang, T. Jebara, and S. F. Chang. Graph transduction via alternating minimization. ICML, J. Wang, Y. G. Jiang, and S. F. Chang. Label diagnosis through self tuning for web image search. CVPR, W. Liu, J. Jun, and S. F. Chang, Robust and Scalable Graph Based Semi Supervised Learning. In Review, IEEE Proceedings, J. Wang, E. Pohlmeyer, B. Hanna, Y. G. Jiang, P. Sajda, and S. F. Chang, Brain State Decoding for Rapid Image Retrieval, ACM Multimedia Conference, J. Wang, A. Kumar, S. F. Chang, Semi Supervised Hashing for Scalable Image Retrieval, CVPR

EE 6882 Visual Search Engine

EE 6882 Visual Search Engine Relevance Feedback March 5 th, 2012 Lecture #7 Graph Based Semi Supervised Learning Application of Image Matching: Manipulation Detection What We have Learned Image Representation