Columbia University High-Level Feature Detection: Parts-based Concept Detectors

Similar documents
Columbia University TRECVID-2005 Video Search and High-Level Feature Extraction

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari

The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System

Visual Search: 3 Levels of Real-Time Feedback

Latest development in image feature representation and extraction

Experimenting VIREO-374: Bag-of-Visual-Words and Visual-Based Ontology for Semantic Video Indexing and Search

EE 6882 Statistical Methods for Video Indexing and Analysis

ELL 788 Computational Perception & Cognition July November 2015

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:

Exploring Semantic Concept Using Local Invariant Features

Semantic Video Indexing

A Reranking Approach for Context-based Concept Fusion in Video Indexing and Retrieval

An Introduction to Content Based Image Retrieval

Statistical Part-Based Models: Theory and Applications in Image Similarity, Object Detection and Region Labeling

MULTIVIEW RANK LEARNING FOR MULTIMEDIA KNOWN ITEM SEARCH

Video annotation based on adaptive annular spatial partition scheme

Multiple Kernel Learning for Emotion Recognition in the Wild

Content-based image and video analysis. Event Recognition

Large-Scale Semantics for Image and Video Retrieval

AUTOMATIC VIDEO INDEXING

ImgSeek: Capturing User s Intent For Internet Image Search

Content Based Image Retrieval

The LICHEN Framework: A new toolbox for the exploitation of corpora

Segmentation. Bottom up Segmentation Semantic Segmentation

Multimedia Event Detection for Large Scale Video. Benjamin Elizalde

Utilizing Semantic Word Similarity Measures for Video Retrieval

Video search requires efficient annotation of video content To some extent this can be done automatically

arxiv: v1 [cs.mm] 12 Jan 2016

Exploratory Product Image Search With Circle-to-Search Interaction

Using the Forest to See the Trees: Context-based Object Recognition

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Consumer Video Understanding

over Multi Label Images

IBM Research Report. Semantic Annotation of Multimedia Using Maximum Entropy Models

Class 5: Attributes and Semantic Features

Query Expansion for Hash-based Image Object Retrieval

Tri-modal Human Body Segmentation

ImageCLEF 2011

YUSUF AYTAR B.S. Ege University

Multimodal Information Spaces for Content-based Image Retrieval

Metric learning approaches! for image annotation! and face recognition!

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Searching Video Collections:Part I

Content-Based Multimedia Information Retrieval

Optimal Video Adaptation and Skimming Using a Utility-Based Framework

Multimedia Information Retrieval The case of video

Active Query Sensing for Mobile Location Search

3D Face and Hand Tracking for American Sign Language Recognition

Learning Spatial Context: Using Stuff to Find Things

WP1: Video Data Analysis

Understanding Sport Activities from Correspondences of Clustered Trajectories

Visual Event Recognition in News Video using Kernel Methods with Multi-Level Temporal Alignment

E6885 Network Science Lecture 11: Knowledge Graphs

TA Section: Problem Set 4

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

High-level Event Recognition in Internet Videos

A REVIEW ON SEARCH BASED FACE ANNOTATION USING WEAKLY LABELED FACIAL IMAGES

Search Engines. Information Retrieval in Practice

Video Search Reranking via Information Bottleneck Principle

Visual Query Suggestion

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Elimination of Duplicate Videos in Video Sharing Sites

Browsing News and TAlk Video on a Consumer Electronics Platform Using face Detection

Learning Realistic Human Actions from Movies

Rushes Video Segmentation Using Semantic Features

Table Of Contents: xix Foreword to Second Edition

Automatic Initialization of the TLD Object Tracker: Milestone Update

IBM Research TRECVID-2007 Video Retrieval System

Exploring Geotagged Images for Land-Use Classification

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Research Article Image and Video Indexing Using Networks of Operators

Sparse Models in Image Understanding And Computer Vision

Automatic Collection of Web Video Shots Corresponding to Specific Actions

Scene Modeling Using Co-Clustering

Evaluation and comparison of interest points/regions

An Efficient Semantic Image Retrieval based on Color and Texture Features and Data Mining Techniques

Segmenting Objects in Weakly Labeled Videos

LIG and LIRIS at TRECVID 2008: High Level Feature Extraction and Collaborative Annotation

Leveraging flickr images for object detection

QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task

Action recognition in videos

Detection III: Analyzing and Debugging Detection Methods

CS 231A Computer Vision (Fall 2011) Problem Set 4

Integrating Visual and Textual Cues for Query-by-String Word Spotting

Deformable Part Models

Binju Bentex *1, Shandry K. K 2. PG Student, Department of Computer Science, College Of Engineering, Kidangoor, Kottayam, Kerala, India

Cross Reference Strategies for Cooperative Modalities

Object recognition (part 2)

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14

Describable Visual Attributes for Face Verification and Image Search

DIGITAL VIDEO now plays an increasingly pervasive role

2. LITERATURE REVIEW

Class 9 Action Recognition

70 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 6, NO. 1, FEBRUARY ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing

Contextual priming for artificial visual perception

Large-scale visual recognition The bag-of-words representation

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Multimedia Databases. Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig

Effective Moving Object Detection and Retrieval via Integrating Spatial-Temporal Multimedia Information

Transcription:

TRECVID 2005 Workshop Columbia University High-Level Feature Detection: Parts-based Concept Detectors Dong-Qing Zhang, Shih-Fu Chang, Winston Hsu, Lexin Xie, Eric Zavesky Digital Video and Multimedia Lab Columbia University (In collaboration with IBM Research in ARDA VACE II Project) 1

data source and design principle Multi-lingual multi-channel video data 277 videos, 3 languages (ARB, CHN, and ENG) 7 channels, 10+ different programs Poor or missing ASR/MT transcripts A very broad concept space over diverse content object, site, people, program, etc TV05 (10), LSCOM-Lite (39), LSCOM (449) Concept detection in such a huge space is challenging Need a principled approach Take advantage of the extremely valuable annotation set Data-driven learning based approach offers potential for scalability 2

Insights from Samples: Object - flag Unique object appearance and structure Some even fool the annotator Variations in scale, view, appearance, number Noisy labels Sometimes contextual, spatial cues are helpful for detection Speaker, stage, sky, crowd 3

Site/location Again visual appearance and spatial structures very useful 4

Activity/Event Visual appearances capture the after effects of some events smoke, fire Sufficient cues for detecting occurrences of events Other events (e.g., people running) need object tracking and recognition 5

Motivation for Spatio-Appearance Models Many visual concepts characterized by Unique spatial structures and visual appearances of the objects and sites joint occurrences of accompanying entities with spatial constraints Motivate the deeper analysis of spatioappearance models 6

Spatio-Features: How to sample local features? traditional Color Moment Color Moment Block-based features: visual appearances of fixed blocks + block locations suitable for concepts with fixed spatial patterns Support Vector Machine (SVM) Adaptive Sampling: Object Parts Part Part relation Part-based model: Model appearance at salient points Model part relations Robust against occlusion, background, location change

Parts-based object detection paradigm also related to Human Vision System (HVS) Image Eye movement and fixation to get retinal images in local regions Pre-attentive stage Group retinal images into object Attentive stage [Rybak et al. 98 ] object

Our TRECVID 2005 Objectives Explore the potential strengths of parts-based models in detecting spatio-dominant concepts fusing with traditional fixed features models detecting other interesting patterns such as Near-Duplicates in broadcast news 9

How do we extract and represent parts? Part-based representation Interest points Segmented Regions Maximum Entropy Regions Part detection Gabor filter, PCA projection, Color histogram, Moments Feature Extraction within local parts Bag Structural Graph Attributed Relational Graph 10

Representation and Learning size; color; texture Individual images Salient points, high entropy regions spatial relation Attributed Relational Graph (ARG) Graph Representation of Visual Content Collection of training images machine learning Random Attributed Relational Graph (R-ARG) Statistics of attributes and relations Statistical Graph Representation of Model

Learning Object Model Re-estimate Matching Probability Patch image cluster Challenge : Finding the correspondence of parts and computing matching probability are NP-complete Solution : Apply and develop advanced machine learning techniques Loopy Belief Propagation (LBP), and Gibbs Sampling plus Belief Optimization (GS+BO) (demo)

Role of RARG Model: Explain object generation process Generative Process : From object model to image Object Model Object Instance Part-based Representation of Image Background parts 5 6 1 4 3 2 Random ARG ARG ARG Sampling node occurrence And node/edge features Sampling background pdf and add background parts 13

Object Detection by Random AG Binary detection problem : contain or not contain an object? H=1 H=0 Likelihood ratio test :, O: input ARG Object likelihood : PO ( H= 1) = PX ( H= 1) PO ( X, H= 1) X modeled by Association Graph Probabilities computed by MRF Likelihood ratio can be computed by variational methods (LPB, MC) n i 4 1 Random ARG for object model n j 2 3 Correspondence x iu 1 n u n v 2 5 ARG for image 3 4 Association Graph 6 14

Extension to Multi-view Object Detection Challenge of multi-view object/scene detection Objects under different views have different structures Part appearances are more diverse Structure variation could be handled by Random ARG model (each view covered by a sub-graph) Shared parts are visible from different views 15

Adding Discriminative Model for Multi-view Concept Detection Previous : Part appearance modeling by Gaussian distribution Now : Part appearance modeling by Support Vector Machine Use SVM plus non-linear kernels to model diverse part appearance in multiple views principle similar to boosting 16

Evaluation in TRECVID 2005 17

Parts-based detector performance in TRECVID 2005 Parts-based detector consistently improves by more than 10% for all concepts It performs best for spatio-dominant concepts such as US flag. It complements nicely with the discriminant classifiers using fixed features. Adding Parts-based Adding Parts-based Avg. performance over all concepts fixed feature Baseline SVM Spatio-dominant concepts: US Flag fixed feature Baseline SVM

Relative contributions Add text or change fusion models does not help Add parts-based Baseline SVM 19

Other Applications of Parts-Based Model: Detecting Image Near Duplicates (IND) Scene Change Parts-based Stochastic Attribute Relational Graph Learning Digitization Camera Change Digitization Learning Learning Pool Many Near-Duplicates in TRECVD 05 Measure IND likelihood ratio Stochastic graph models the physics of scene transformation Near duplicates occur frequently in multi-channel broadcast But difficult to detect due to diverse variations Problem Complexity Similarity matching < IND detection < object recognition Duplicate detection is the single most effective tool in our Interactive Search TRECVID 05 Interactive Search

Near Duplicate Benchmark Set (available for download at Columbia Web Site) 21

Examples of Near Duplicate Search in TRECVID 05 22

Application: Concept Search QueryConcept Search Query Text Find shots of a road with one or more cars Documents Subshots Part-of-Speech Tags - keywords road car Map to concepts WordNet Resnik semantic similarity Concept Metadata Names and Definitions Confidence for each concept Concept Models Simple SVM, Grid Color Moments, Gabor Texture Model Reliability Expected AP for each concept. Concept Space 39 dimensions (1.0) road (0.1) fire (0.2) sports (1.0) car. (0.6) boat (0.0) person Concept Space 39 dimensions (0.9) (0.9) road (0.1) (0.9) road (0.1) (0.9) fire road (0.9) fire road road (0.3) (0.1) (0.3) (0.1) sports fire (0.1) sports fire fire (0.9) (0.3) (0.9) (0.3) car sports (0.3) car sports sports. (0.9) car. (0.9) (0.9) car car (0.2). (0.2).. boat boat (0.1) (0.2) (0.1) (0.2) person boat (0.2) (0.1) person boat boat (0.1) person (0.1) person person Euclidean Distance Map text queries to concept detection Use humandefined keywords from concept definitions Measure semantic distance between query and concept Use detection and reliability for subshot documents

Concept Search Automatic - help queries with related concepts Find shots of boats. Method Story Text CBIR Concept Fused AP.169.002.115.195 Find shots of a road with one or more cars. Method AP Story Text.053 CBIR.009 Concept.090 Fused.095 Manual / Interactive Manual keyword selection allows more relationships to be found. Query Text Find shots of an office setting, i.e., one or more desks/tables and one or more computers and one or more people Concepts Office Query Text Find shots of a graphic map of Iraq, location of Bagdhad marked - not a weather map Concepts Map Query Text Find shots of one or more people entering or leaving a building Concepts Person, Building, Urban Query Text Find shots of people with banners or signs Concepts March or protest

Columbia Video Search Engine System Overview http://www.ee.columbia.edu/cuvidsearch User Level Search Objects Query topic class mining Cue-X reranking Interactive activity log automatic/manual search mining query topic classes cue-x re-ranking Interactive search user search pattern mining Multi-modal Search Tools combined text-concept search story-based browsing near-duplicate browsing Content Exploitation multi-modal feature extraction story segmentation semantic concept detection text search feature extraction (text, video, prosody) video speech text concept search Image matching automatic story segmentation story browsing Near-duplicate search concept detection near-duplicate detection Demo in the poster session

Search User Interface 26

Conclusions Parts-based models are intuitive and general Effective for concepts with strong spatioappearance cues Complementary with fixed feature classifiers (e.g., SVM) Semi-supervised: the same image-level annotations sufficient, no need for part-level labels Parts models also useful for detecting near duplicates in multi-source news Valuable for interactive search 27