Visual Search: 3 Levels of Real-Time Feedback

Size: px

Start display at page:

Download "Visual Search: 3 Levels of Real-Time Feedback"

Juniper Casey
5 years ago
Views:

1 Visual Search: 3 Levels of Real-Time Feedback Prof. Shih-Fu Chang Department of Electrical Engineering Digital Video and Multimedia Lab

2 A lot of work on Image Classification Analysis models Anchor Snow Soccer Building Outdoor Audio-visual features Geo, social, camera metadata User/mission context Rich semantic labels 2

3 Object Detection (PASCAL VOC) aeroplane bird bicycle boat train Person chair car cat bottle dining table tv/monitor cow dog horse potted plant motorbike bus sheep sofa

4 Research community growing fast! TRECVID Data domain amount types Lexicon size Broadcast news, documentary, surveillance, Internet, 400 hours Sound & Vision 170 hours Television News 100 hours BBC rushes (130,000+ subshots) video shots, keyframes 10 (2004, 2005) 39 (2006, 2007) 20 (2008) 130 (2010) LSCOM Broadcast news 170 hours Broadcast News video concepts CalTech256 Internet Images 30,607 images images 256 classes PASCAL Internet Images 9,963 images 24,640 annotated objects images, objects (as of Nov. 2009) 20 classes Tiny Image Internet Images 80,000,000 tiny images (32x32) images 75,378 WordNet nouns LabelMe Internet and UGC conten 30,369 images from 183 folders images, keyframes 111,490 object labels ImageNet Internet images 9,386,073 images images 14,847 WordNet synsets Lotus Hill Dataset Internet Images 500,000+ images and keyframes images, keyframes 280 object classes

CuZero: 400+ visual classifier models concept detection models: objects, people, location,

cityscape crowd desert dirt_gravel_road entertainment explosion_fire foresthighway hospital

peoplemarching person powerplants riot river road rpg shooting smoke tanksurban vegetation

5 CuZero: 400+ visual classifier models concept detection models: objects, people, location, scenes, events, etc airplane airplane_takeoff airport_or_airfieldarmed_personbuilding car cityscape crowd desert dirt_gravel_road entertainment explosion_fire foresthighway hospital insurgents landscape maps military military_base military_personnelmountainnighttime peoplemarching person powerplants riot river road rpg shooting smoke tanksurban vegetation vehicle waterscape_waterfront weaponsweather Columbia Runs Evaluation of 20 concepts at TRECVID 2008

6 TRECVID: Concept Detection Examples Top five classification results Classroom Demonstration Or Protest Cityscape Airplane flying Singing

flames visible Which classifiers to use? Which classifiers work?

7 Problem: User Gap Given a new search target, users have difficulty in choosing appropriate concept classifiers Find shots of something burning with flames visible Which classifiers to use? Which classifiers work? hundreds of classifiers car outdoor urban fire building airplane car crash road person Explosion

8 1 Specific exemplar Problem: Specific example vs. Generic concept Query: find person running around a building car Running vehicle building person road walking loading urban 2 Generic concept

9 Pains of Frustrated Users User forced to take one shot searches, iterating queries with a trial and error approach...

Semantic Search Target (query topic) C Concept

airplane car crash road sky person user CuZero

10 Semantic Search Target (query topic) C Concept Mapping olumbia 2 u Zero 1 Instant Query Concept Mapping Real-Time Navigation & Feedback Multi-Concept Navigation Concept Visualization niversity Real-time Query Manipulation Video Content (with metadata) content analytics thousands of semantic concepts car urban outdoo fire r building airplane car crash road sky person user CuZero Search Engine Concept Pool (Zavesky and Chang, Multimedia Info Retrieval MIR 08)

11 Instant Feedback: Instant visual concept suggestion Instant Classifier Suggestion query time concept mining S.-F. Chang, Columbia U. 11

12 Instant Feedback: Mapping formulate query refine Precise specification of search anchors Refine query by sliding inspector through the map Intuitive control: Closer to anchor is more like it Increases exploration breadth, instead of single list browsing Instant result display for each location, without new query computation anchor inspector anchor weights Road Car Outdoor Explosio n Fire 12

13 Demos Find lake front buildings in the Central Park Find person walking around building Find a car on a road in a snowy condition Find urban explosion scenes in UAV videos 13

Instant Feedback of Query Manipulation Multi-modal query input Visual examples (motion track,

road fire smoke Real-Time Sliding Query Panel 2D sliding panel for arbitrary query manipulation

Distributed, lightweight environment Distributed client-server approach for fast deployment with large

14 Instant Feedback of Query Manipulation Multi-modal query input Visual examples (motion track, object/image samples) >350 object/scene models Textual, geo-spatio-temporal cues running 1 running 2 road fire smoke Real-Time Sliding Query Panel 2D sliding panel for arbitrary query manipulation Instant query result update after refinement Intuitive feedback for query weight adjustment Distributed, lightweight environment Distributed client-server approach for fast deployment with large data archive. Prototypesof 200+ hours of TRECVID videos and VIRAT videos Smoke Road Explosion Running 1 Running 2 Query refinement

User feedback can be used in many other ways 1 2 Query Formulation Query Processing 3 Results Visual Examples Classifiers Cues/constraints Feature

15 User feedback can be used in many other ways 1 2 Query Formulation Query Processing 3 Results Visual Examples Classifiers Cues/constraints Feature Selection Distance Metric Ranking Model Updated results Handle novel data Online Update/Rerank New Classifiers Relevance Feedback ( track, event, activity) Query Log

Graph-based feedback propagation Propagate user feedback to larger collection Input samples with sparse labels Label propagation on graph Label

16 Graph-based feedback propagation Propagate user feedback to larger collection Input samples with sparse labels Label propagation on graph Label inference results Positive Negative Unlabeled Positive Negative - - graph - - graph node - - weight matrix - - risk function - - classification - - label matrix

17 A hot topic in Machine Learning Given initial labels, Y, find classification function F over graph nodes Label smoothness Fit known labels (Zhou, et al NIPS04) Gaussian fields & Harmonic functions (Zhu et al ICML03) 17

18 Many Challenging Issues LGC Method GFHF Method Unbalanced Labels Bad Label Locations Noisy Labels S.-F. Chang, Columbia U

19 Graph Transduction via Alternating Minimization (GTAM) (Wang, Jebara, Chang, ICML08) (Wang and Chang, CVPR09) -- Bivariate Optimization over Labels (Y) and Prediction (F) Propagation Step Given label (Y), propagate over graph, predict F Label Selection Step Iteratively add good labels or remove bad labels 19 (TAG demo)

20 Iteration # 2 Initial Labels Label Diagnosis and Self Tuning (Wang and Chang CVPR 09) Iteration # 4 Iteration # 6 The values of the cost function Q during optimization procedure of LDST and GTAM methods.

21 Use Pseudo-Feedback when user not available Google Search Statue of Liberty Keyword Search Flickr Photo Site Top images as + Bottom imgs as - Label Diagnosis Diffusion

22 Use Pseudo-Feedback when user not available Rerank Keyword Search Flickr Photo Site Top images as + Bottom imgs as - Label Diagnosis Diffusion

23 Use Pseudo-Feedback when user not available Google Search Tiger Keyword Search Flickr Photo Site Top images as + Bottom imgs as - Label Diagnosis Diffusion

24 Use Pseudo-Feedback when user not available Rerank Keyword Search Flickr Photo Site Top images as + Bottom imgs as - Label Diagnosis Diffusion

25 Feedback at a faster time scale via Brain State Signaling Human Vision: Superb by quick gist in the Blink of an Eye (Hubel, 1995) Joint work with Paul Sajda s group 25

26 Hybrid Computer-Human Vision System Goal: optimally integrate neuro-vision and computer vision/machine learning to maximize information throughput and retrieval accuracy of image content 26

27 The Intention Readout Experiment User thinks about what he/she wants to search Database (any target that may interest users) 27

28 The Paradigm Neural (EEG) decoder Interest-scores Database 28

29 The Paradigm Neural (EEG) decoder Exemplar labels (noisy) Features from the entire DB Database Semi-supervised computer vision retrieval prediction score 29

30 The Paradigm Pre-triage Post-triage 30

31 The Paradigm Human inspects only a small sample set via BCI Machine filters out noise and retrieves targets from very large DB General: no predefined target models, no keyword High Throughput: neuro-vision as bootstrap of fast computer vision Pre-triage Post-triage 31

Identifying Discriminative Components in the EEG Using Single-Trial Analysis LDA or Logistic Regression is used to learn the contributions of EEG signal components at different

32 Identifying Discriminative Components in the EEG Using Single-Trial Analysis LDA or Logistic Regression is used to learn the contributions of EEG signal components at different spatial-temporal locations (Parra, Sajda et al. 2002, 2003) Optimal spatial filtering across electrodes within each short window (e.g., 100ms) Optimal temporal filtering over time windows after onset 32

33 EEG Feedback and Manifold Learning Opportunities and Issues: EEG results used as exemplar feedback indicating user interest Propagate interest scores over manifolds in the image space Challenge: EEG labels are noisy and limited No prior knowledge about target models 33

Experiments CalTech101: 3798 images from 62 categories Satellite images Generic neural decoder trained per user using images (Soccer Ball or Baseball Gloves)

34 Experiments CalTech101: 3798 images from 62 categories Satellite images Generic neural decoder trained per user using images (Soccer Ball or Baseball Gloves) from Caltech256 A subset images randomly sampled to construct 6-Hz RSVP sequence Initial Trials: 4 subjects, 3 targets (Dalmatian, Chandelier/Menorah, & Starfish) 34

35 Example results Top 20 results of Neural EEG detection Top 20 results of Hybrid System (BCI-VPM) 35

36 Retrieval on Satellite Imagery The experimental results of helipad target RSVP, showing the top 20 ranked images. a) ranking by original EEG scores; b) ranking by the BCI-VPM refined interest score. 36

Conclusions Instant feedback and refinement tools improve the utility of small imperfect classifier pools As shown in CuZero search system Relevance feedback techniques maximizes the utility of

37 Conclusions Instant feedback and refinement tools improve the utility of small imperfect classifier pools As shown in CuZero search system Relevance feedback techniques maximizes the utility of limited imperfect user input As shown in graph-based propagation Other forms of relevance feedback Pseudo feedback from Web search Feedback from neuro state decoding digital video multimedia lab

38 References (many papers can be found at ) E. Zavesky, S.-F. Chang, CuZero: Embracing the Frontier of Interactive Visual Search for Informed Users, ACM Multimedia Information Retrieval, Y.-G. Jiang, A.Yanagawa, S.-F.Chang, C.-W.Ngo, CU-VIREO374: Fusing Columbia374 and VIREO374 for Large Scale Semantic Concept Detection, ADVENT Technical Report # Columbia University, August ( J. Wang, E. Pohlmeyer, B. Hanna, Y.-G. Jiang, P. Sajda, and S.-F. Chang, Brain State Decoding for Rapid Image Retrieval, ACM Multimedia Conference, Jun Wang, Yu-Gang Jiang, Shih-Fu Chang, Label Diagnosis through Self Tuning for Web Image Search, CVPR Jun Wang, Sanjiv Kumar, Shih-Fu Chang, Semi-Supervised Hashing for Scalable Image Retrieval, CVPR Junfeng He, Wei Liu, Shih-Fu Chang, Scalable Similarity Search with Optimized Kernel Hashing, ACM SIGKDD, Wei Liu, Junfeng He, Shih-Fu Chang, Large Graph Construction for Scalable Semi-Supervised Learning, ICML 2010.

Graph-based Techniques for Searching Large-Scale Noisy Multimedia Data

Graph-based Techniques for Searching Large-Scale Noisy Multimedia Data Shih-Fu Chang Department of Electrical Engineering Department of Computer Science Columbia University Joint work with Jun Wang (IBM),