The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System
|
|
- Augustus Freeman
- 5 years ago
- Views:
Transcription
1 The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System Our first participation on the TRECVID workshop A. F. de Araujo 1, F. Silveira 2, H. Lakshman 3, J. Zepeda 2, A. Sheth 2, P. Perez 2, B. Girod 1 1 Stanford University 2 Technicolor 3 Fraunhofer HHI Guest lecture in Berkeley CS294, Nov. 7 th 2012
2 Outline q What is TRECVID? q TRECVID Semantic Indexing task q Overview of our system and how it differs from others q Experimental results q Conclusion 2
3 What is TRECVID? q Part of the TREC (Text REtrieval Conference) series, sponsored by NIST (National Institute of Standards and Technology) q Started as video track in 2001/2002, became an independent evaluation in 2003 q Goal of the conference series: encourage research in information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results 3
4 What is TRECVID? q De facto venue for state-of-the-art video analysis research q TRECVID is concluded with a workshop in Nov/Dec. Attendance is restricted to teams that submit something. q 6 different tracks/tasks: q Semantic Indexing (SIN) (focus of the presentation) q Known-item search (KIS) q Surveillance event detection (SED) q Instance Search (INS) q Multimedia Event Detection (MED) q Multimedia Event Recounting (MER) 4
5 What is TRECVID? Known-Item Search (KIS) q Use case: you ve seen a specific video and want to find it again, but don t know how to go directly to it. You remember some things about it. q System task: Given a test collection of short videos and a topic (some words and/or phrases describing the target video, people, places, or things visible): Automatically return a list of up to 100 video IDs ranked according to the likelihood that the video is the target one -- OR -- Interactively return a single video ID believed to be the target 5
6 What is TRECVID? Interactive surveillance event detection (SED) q Use case: detection of events in large amounts of surveillance video. q System task: Given a textual description of an observable event of interest: Automatically detect all occurrences of the event in a nonsegmented corpus of video 6
7 What is TRECVID? Instance Search (INS) q Use case: browsing a video archive, you find a video of a person, place, or thing of interest to you, known or unknown, and want to find more video containing the same target, but not necessarily in the same context. q System task: Given a topic with: (a) example segmented images of the target (2-6), and (b) a target type (PERSON, CHARACTER, PLACE, OBJECT) Return a list of up to 1000 shots ranked by likelihood that they contain the topic target 7
8 What is TRECVID? Multimedia Event Detection (MED) q Use case: Searching user-defined events on pre-computed metadata q System task: Given an event specified by an event kit (name, textual explanation, video exemplars), search multimedia recordings for the event. q Associated task: Multimedia Event Recounting: produce textual recounting that summarizes key evidence of the event 8
9 Semantic Indexing (SIN) What is TRECVID? q Use cases: Filtering, categorization, browsing, search... q System task: Given the test collection, master shot reference, and concept definitions, return for each concept a list of at most 2000 shot IDs from the test collection ranked according to their likeliness of containing the concept 9
10 TRECVID SIN task - Motivation q YouTube reports 72hs of videos uploaded every minute; Flickr reports 1M photo uploads per day; etc q Video increasingly ubiquitous, collections increasingly large: from broadcasters to your personal collection q Problems: ü How can you find a video? Or a video segment? ü How can you organize your video collection? q Video Semantic Indexing can help! Indexing videos with tags that represent objects, scenes, actions, events 10
11 TRECVID SIN task Background (1/3) q LSCOM effort [Naphade et al, 2006] [ taxonomy of 1000 concepts, realistic use cases, large annotated set of broadcast news. Example use cases: q Armed uniformed soldiers walking on city lanes q U.S. Maps depicting the electoral vote distribution (blue vs. red state) q Based on these use cases, [Hauptmann et all, 2007] showed that keyword-based search performed poorly; and that including content helped a lot! 11
12 TRECVID SIN task Background (2/3) q [Hauptmann et all, 2007] ü Using 320 semantic concepts, performance improved significantly (from 1% to 10% MAP in case where concept detectors have low performance) ü Extrapolation shows that a few thousand concepts should be enough to take video retrieval s performance to the level of current text-based search engines (65% MAP) ü Mindset: with a generic framework, detect a large number of concepts with reasonable performance 12
13 TRECVID SIN task Background (3/3) q Video Search Engine [Snoek et all, 2007] ü User inputs information need (query-by-keyword, query-byimage, query-by-concept) ü System interprets need, processes it based on metadata, and returns best estimate of relevant multimedia documents 13
14 TRECVID SIN task how it works (1/5) q Schedule: roughly 5 months from release of data to submission date q Three submission types: q LIGHT: 50 concepts (20 selected for evaluation) q FULL: 346 concepts (50 selected for evaluation) q PAIR: 10 concept pairs q Each team can submit up to 4 runs q Submissions consist of ranked list of 2000 shots per concept (as if it was the output of a system queried with the concept) 14
15 TRECVID SIN task how it works (2/5) q Videos range from 10s to 4min. Mostly YouTube-like, usergenerated content q Unit of analysis: shot (e.g., annotations are given at the shot level). Each video is composed by many shots q Annotations are collected via a collaborative process with the participating teams. Initially, keyframes are shown to annotator; video is played only if necessary q IMPORTANT: not all shots are annotated. An active-learning based system runs during annotation to select most useful samples to be annotated 15
16 TRECVID SIN task how it works (3/5) q 2012 training data: q 19,701 videos (600 hours) q 400,289 shots (~20 shots/video) q On average, a concept has: 1,225 P; 42,924 N q 2012 testing data q 8,263 videos (200 hours) q 145,634 shots q Videos contain some metadata: title, tags, short descriptions, outputs of speech-to-text (but not very consistent) q Data collected from Internet Archive and representative of usergenerated content 16
17 TRECVID SIN task how it works (4/5) q Example video 1 Concepts: Indoor, Kitchen, Room q Example video 2 Concepts: Trees, Vegetation q Example video 3 Concepts: Cityscape, Daytime_Outdoor, Outdoor, Scene_text, Streets, City, Text 17
18 TRECVID SIN task how it works (5/5) q Performance Measures q Inferred Average Precision (infap) per class ü Metric conceived to approximate the usual Average Precision, without the need of evaluating every single test instance [Yilmaz et al, 2008] q Mean infap (MinfAP) for final score: simple mean of the individual infap per class q Other measures, such as P-R and P@n curves are provided, but main one is MinfAP 18
19 Overview of Semantic Indexing System q Overall architecture shots P: positive N: negative M: missing S: skip 19
20 Overview of Semantic Indexing System q Architecture in more detail ü Each color is a feature channel ü Input is a shot (in our case, only keyframe is used) 20
21 Descriptor extraction q Modalities: q Visual Keyframe-based descriptors shown to provide most gain q Audio Significant gain only for some concepts q Tags, short descriptions Sparse, multilingual q Speech-to-text transcriptions Used only keyframebased visual modality English only, even if videos are not in English 21
22 Descriptor extraction q Local descriptors: q Keypoint selection ü Combination of dense extraction (best) and Harris-Laplace detector ü It has been shown that the use of different keypoint extraction methods provides complementary gain q Patch description ü SIFT descriptor ü OppSIFT descriptor (SIFT-based on each color component on Opponent color space) ü We verify gains when combining these two 22
23 Descriptor extraction q Global descriptor: CENTRIST/SPACT ü Binary pattern for each pixel, based on comparisons to neighbors ü Aggregated in histograms, according to spatial location on image: Spatial Principal component Analysis of Census Transform histograms SPACT ü Shown to provide very good results, at very low computational cost 23
24 Local descriptor aggregation q BoVW + SPM ü Histogram of appearances of visual words ü Pooled over different spatial regions of the image q Residual Vectors ü Mean of differences in each Voronoi cell ü Inspired on Fisher Vector approach [Perronin et al, 2007] ü No spatial aggregation due to lack of time 24
25 Classification q Machine Learning: SVMs ü One-versus-rest (shown to perform well for large-scale in [Perronin, 2012]) ü Use of HIK and RBF kernels, depending on feature type ü Validation experiments based on Average Precision, to choose the parameter C 25
26 Classification q Late fusion ü After obtaining classifier scores for each feature channel, there needs to be a way to combine them ü We perform a linear combination of their scores, based (1) on validation performances, or (2) on a simple average of scores ü Other option: early fusion: performed worse in preliminary experiments 26
27 Classification q Co-occurrence information ü Since the SIN task is a multi-label problem, should help ü Example: concept News Studio happens often together with concept Person ü Very hard to exploit, since not all shots are tagged: ü less than half of shots contain more than 10 annotations ü 16% of shots contain more than 100 annotations ü Previous work [Qi et al, 2007] shows that it is 25 times more computational complex to train using co-occurrence information 27
28 Feature channels BoW Residual Global q Dense keypoint extraction + OppSIFT. Bag-of-Words and Spatial Pyramid pooling in a 1x3 grid with a 4096 visual dictionary; q Dense keypoint extraction + SIFT. Bag-of-Words and Spatial Pyramid pooling in a 1x3 grid with a 4096 visual dictionary; q HarLap keypoint extraction + OppSIFT. Bag-of-Words pooling with a 4096 visual dictionary; q HarLap keypoint extraction + SIFT. Bag-of-Words pooling with a 4096 visual dictionary; q Residual vectors on PCA-reduced (to 64 dimensions) densely extracted OppSIFT, using a 256-dimensional visual dictionary; q Residual vectors on PCA-reduced (to 32 dimensions) densely extracted SIFT, using a 512-dimensional visual dictionary; q SPACT: Spatial Principal component Analysis of Census Transform Histograms (CENTRIST). 28
29 Experimental results q Inferred Precision-Recall curve 29
30 q Inferred curve Experimental results 30
31 Experimental results q Mean Inferred Average Precisions per concept 31
32 Timing q Descriptor extraction and aggregation: 1 to 2 days q Precomputation of kernel matrices: ü Training: 10 hours to 5 days O(400k^2) ü Testing: 5 hours to 2.5 days O(400k x 100k) q Classifier training, with precomputed kernels, was limited by memory loading time: ~ 1 hour to load on memory, ~10 min to train classifier q Late fusion does not take a significant amount of time 32
33 Visualization 33
34 Conclusion q We ve built a quite complex system that performs Semantic Concept detection in a large-scale multimedia database (training on 400k shots, testing on 100k shots) q Use of advanced Computer Vision and Machine Learning tools, and had to put them all together to work q In our first participation, we achieved the 6 th best performance in the TRECVID SIN task 34
35 Thank You Project website: h/p://stanford.edu/~afaraujo/trecvid h/p://stanford.edu/~afaraujo
THE steep rise in the availability of video content, during
TRECVID WORKSHOP, NOVEMBER 2012 1 The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System A. F. de Araujo 1, F. Silveira 2, H. Lakshman 3, J. Zepeda 2, A. Sheth 2, P. Pérez 2 and B. Girod
More informationLIG and LIRIS at TRECVID 2008: High Level Feature Extraction and Collaborative Annotation
LIG and LIRIS at TRECVID 2008: High Level Feature Extraction and Collaborative Annotation Stéphane Ayache, Georges Quénot To cite this version: Stéphane Ayache, Georges Quénot. LIG and LIRIS at TRECVID
More informationVideo annotation based on adaptive annular spatial partition scheme
Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory
More informationBUAA AUDR at ImageCLEF 2012 Photo Annotation Task
BUAA AUDR at ImageCLEF 2012 Photo Annotation Task Lei Huang, Yang Liu State Key Laboratory of Software Development Enviroment, Beihang University, 100191 Beijing, China huanglei@nlsde.buaa.edu.cn liuyang@nlsde.buaa.edu.cn
More informationColumbia University High-Level Feature Detection: Parts-based Concept Detectors
TRECVID 2005 Workshop Columbia University High-Level Feature Detection: Parts-based Concept Detectors Dong-Qing Zhang, Shih-Fu Chang, Winston Hsu, Lexin Xie, Eric Zavesky Digital Video and Multimedia Lab
More informationClassifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao
Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped
More informationBossaNova at ImageCLEF 2012 Flickr Photo Annotation Task
BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task S. Avila 1,2, N. Thome 1, M. Cord 1, E. Valle 3, and A. de A. Araújo 2 1 Pierre and Marie Curie University, UPMC-Sorbonne Universities, LIP6, France
More informationEVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari
EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION Ing. Lorenzo Seidenari e-mail: seidenari@dsi.unifi.it What is an Event? Dictionary.com definition: something that occurs in a certain place during a particular
More informationImageCLEF 2011
SZTAKI @ ImageCLEF 2011 Bálint Daróczy joint work with András Benczúr, Róbert Pethes Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences Training/test
More informationQuaero at TRECVid 2012: Semantic Indexing
Quaero at TRECVid 2012: Semantic Indexing Bahjat Safadi 1, Nadia Derbas 1, Abdelkader Hamadi 1, Franck Thollard 1, Georges Quénot 1, Jonathan Delhumeau 2, Hervé Jégou 2, Tobias Gehrig 3, Hazim Kemal Ekenel
More informationMultimedia Information Retrieval
Multimedia Information Retrieval Prof Stefan Rüger Multimedia and Information Systems Knowledge Media Institute The Open University http://kmi.open.ac.uk/mmis Multimedia Information Retrieval 1. What are
More informationTRECVid 2012 Experiments at Dublin City University
TRECVid 2012 Experiments at Dublin City University Jinlin Guo, Zhenxing Zhang, David Scott, Frank Hopfgartner, Rami Albatal, Cathal Gurrin, and Alan F. Smeaton CLARITY: Centre for Sensor Web Technologies
More informationAggregating Descriptors with Local Gaussian Metrics
Aggregating Descriptors with Local Gaussian Metrics Hideki Nakayama Grad. School of Information Science and Technology The University of Tokyo Tokyo, JAPAN nakayama@ci.i.u-tokyo.ac.jp Abstract Recently,
More informationMultimodal Medical Image Retrieval based on Latent Topic Modeling
Multimodal Medical Image Retrieval based on Latent Topic Modeling Mandikal Vikram 15it217.vikram@nitk.edu.in Suhas BS 15it110.suhas@nitk.edu.in Aditya Anantharaman 15it201.aditya.a@nitk.edu.in Sowmya Kamath
More informationHello, I am from the State University of Library Studies and Information Technologies, Bulgaria
Hello, My name is Svetla Boytcheva, I am from the State University of Library Studies and Information Technologies, Bulgaria I am goingto present you work in progress for a research project aiming development
More informationTRECVid 2013 Experiments at Dublin City University
TRECVid 2013 Experiments at Dublin City University Zhenxing Zhang, Rami Albatal, Cathal Gurrin, and Alan F. Smeaton INSIGHT Centre for Data Analytics Dublin City University Glasnevin, Dublin 9, Ireland
More informationString distance for automatic image classification
String distance for automatic image classification Nguyen Hong Thinh*, Le Vu Ha*, Barat Cecile** and Ducottet Christophe** *University of Engineering and Technology, Vietnam National University of HaNoi,
More informationExtracting Spatio-temporal Local Features Considering Consecutiveness of Motions
Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Akitsugu Noguchi and Keiji Yanai Department of Computer Science, The University of Electro-Communications, 1-5-1 Chofugaoka,
More informationUser Strategies in Video Retrieval: a Case Study
User Strategies in Video Retrieval: a Case Study L. Hollink 1, G.P. Nguyen 2, D.C. Koelma 2, A.Th. Schreiber 1, M. Worring 2 1 Business Informatics, Free University Amsterdam. {hollink,schreiber}@cs.vu.nl
More informationMESH participation to TRECVID2008 HLFE
MESH participation to TRECVID2008 HLFE J. Molina 3, V. Mezaris 2, P. Villegas 1, G. Tolias 4, E. Spyrou 4, N. Sofou 4, J. Rodríguez-Benito 1, G. Papadopoulos 2, S. Nikolopoulos 2, J. M. Martínez 3, I.
More informationRanked Retrieval. Evaluation in IR. One option is to average the precision scores at discrete. points on the ROC curve But which points?
Ranked Retrieval One option is to average the precision scores at discrete Precision 100% 0% More junk 100% Everything points on the ROC curve But which points? Recall We want to evaluate the system, not
More informationThree things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012)
Three things everyone should know to improve object retrieval Relja Arandjelović and Andrew Zisserman (CVPR 2012) University of Oxford 2 nd April 2012 Large scale object retrieval Find all instances of
More informationAUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS
AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS Nilam B. Lonkar 1, Dinesh B. Hanchate 2 Student of Computer Engineering, Pune University VPKBIET, Baramati, India Computer Engineering, Pune University VPKBIET,
More informationBeyond Bags of Features
: for Recognizing Natural Scene Categories Matching and Modeling Seminar Instructed by Prof. Haim J. Wolfson School of Computer Science Tel Aviv University December 9 th, 2015
More informationTA Section: Problem Set 4
TA Section: Problem Set 4 Outline Discriminative vs. Generative Classifiers Image representation and recognition models Bag of Words Model Part-based Model Constellation Model Pictorial Structures Model
More informationCS229: Action Recognition in Tennis
CS229: Action Recognition in Tennis Aman Sikka Stanford University Stanford, CA 94305 Rajbir Kataria Stanford University Stanford, CA 94305 asikka@stanford.edu rkataria@stanford.edu 1. Motivation As active
More informationHarvesting collective Images for Bi-Concept exploration
Harvesting collective Images for Bi-Concept exploration B.Nithya priya K.P.kaliyamurthie Abstract--- Noised positive as well as instructive pessimistic research examples commencing the communal web, to
More informationCS 231A Computer Vision (Fall 2011) Problem Set 4
CS 231A Computer Vision (Fall 2011) Problem Set 4 Due: Nov. 30 th, 2011 (9:30am) 1 Part-based models for Object Recognition (50 points) One approach to object recognition is to use a deformable part-based
More informationAn Efficient Methodology for Image Rich Information Retrieval
An Efficient Methodology for Image Rich Information Retrieval 56 Ashwini Jaid, 2 Komal Savant, 3 Sonali Varma, 4 Pushpa Jat, 5 Prof. Sushama Shinde,2,3,4 Computer Department, Siddhant College of Engineering,
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationAn Introduction to Content Based Image Retrieval
CHAPTER -1 An Introduction to Content Based Image Retrieval 1.1 Introduction With the advancement in internet and multimedia technologies, a huge amount of multimedia data in the form of audio, video and
More informationObject Classification Problem
HIERARCHICAL OBJECT CATEGORIZATION" Gregory Griffin and Pietro Perona. Learning and Using Taxonomies For Fast Visual Categorization. CVPR 2008 Marcin Marszalek and Cordelia Schmid. Constructing Category
More informationExploring Geotagged Images for Land-Use Classification
Exploring Geotagged Images for Land-Use Classification Daniel Leung Electrical Engineering & Computer Science University of California, Merced Merced, CA 95343 cleung3@ucmerced.edu Shawn Newsam Electrical
More informationScene Recognition using Bag-of-Words
Scene Recognition using Bag-of-Words Sarthak Ahuja B.Tech Computer Science Indraprastha Institute of Information Technology Okhla, Delhi 110020 Email: sarthak12088@iiitd.ac.in Anchita Goel B.Tech Computer
More informationHands On: Multimedia Methods for Large Scale Video Analysis (Lecture) Dr. Gerald Friedland,
Hands On: Multimedia Methods for Large Scale Video Analysis (Lecture) Dr. Gerald Friedland, fractor@icsi.berkeley.edu 1 Today Recap: Some more Machine Learning Multimedia Systems An example Multimedia
More informationAction recognition in videos
Action recognition in videos Cordelia Schmid INRIA Grenoble Joint work with V. Ferrari, A. Gaidon, Z. Harchaoui, A. Klaeser, A. Prest, H. Wang Action recognition - goal Short actions, i.e. drinking, sit
More informationSummarization of Egocentric Moving Videos for Generating Walking Route Guidance
Summarization of Egocentric Moving Videos for Generating Walking Route Guidance Masaya Okamoto and Keiji Yanai Department of Informatics, The University of Electro-Communications 1-5-1 Chofugaoka, Chofu-shi,
More informationPRISM: Concept-preserving Social Image Search Results Summarization
PRISM: Concept-preserving Social Image Search Results Summarization Boon-Siew Seah Sourav S Bhowmick Aixin Sun Nanyang Technological University Singapore Outline 1 Introduction 2 Related studies 3 Search
More informationon learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015
on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 Vector visual representation Fixed-size image representation High-dim (100 100,000) Generic, unsupervised: BoW,
More informationAction Recognition & Categories via Spatial-Temporal Features
Action Recognition & Categories via Spatial-Temporal Features 华俊豪, 11331007 huajh7@gmail.com 2014/4/9 Talk at Image & Video Analysis taught by Huimin Yu. Outline Introduction Frameworks Feature extraction
More informationCPPP/UFMS at ImageCLEF 2014: Robot Vision Task
CPPP/UFMS at ImageCLEF 2014: Robot Vision Task Rodrigo de Carvalho Gomes, Lucas Correia Ribas, Amaury Antônio de Castro Junior, Wesley Nunes Gonçalves Federal University of Mato Grosso do Sul - Ponta Porã
More informationBasic Problem Addressed. The Approach I: Training. Main Idea. The Approach II: Testing. Why a set of vocabularies?
Visual Categorization With Bags of Keypoints. ECCV,. G. Csurka, C. Bray, C. Dance, and L. Fan. Shilpa Gulati //7 Basic Problem Addressed Find a method for Generic Visual Categorization Visual Categorization:
More informationLarge scale object/scene recognition
Large scale object/scene recognition Image dataset: > 1 million images query Image search system ranked image list Each image described by approximately 2000 descriptors 2 10 9 descriptors to index! Database
More informationIntegrating Visual and Textual Cues for Query-by-String Word Spotting
Integrating Visual and Textual Cues for D. Aldavert, M. Rusiñol, R. Toledo and J. Lladós Computer Vision Center, Dept. Ciències de la Computació Edifici O, Univ. Autònoma de Barcelona, Bellaterra(Barcelona),
More informationSkyFinder: Attribute-based Sky Image Search
SkyFinder: Attribute-based Sky Image Search SIGGRAPH 2009 Litian Tao, Lu Yuan, Jian Sun Kim, Wook 2016. 1. 12 Abstract Interactive search system of over a half million sky images Automatically extracted
More informationMultiple Kernel Learning for Emotion Recognition in the Wild
Multiple Kernel Learning for Emotion Recognition in the Wild Karan Sikka, Karmen Dykstra, Suchitra Sathyanarayana, Gwen Littlewort and Marian S. Bartlett Machine Perception Laboratory UCSD EmotiW Challenge,
More informationWikipedia Retrieval Task ImageCLEF 2011
Wikipedia Retrieval Task ImageCLEF 2011 Theodora Tsikrika University of Applied Sciences Western Switzerland, Switzerland Jana Kludas University of Geneva, Switzerland Adrian Popescu CEA LIST, France Outline
More informationTri-modal Human Body Segmentation
Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Tri-modal dataset 3 Proposed baseline 4
More informationMixtures of Gaussians and Advanced Feature Encoding
Mixtures of Gaussians and Advanced Feature Encoding Computer Vision Ali Borji UWM Many slides from James Hayes, Derek Hoiem, Florent Perronnin, and Hervé Why do good recognition systems go bad? E.g. Why
More information88 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 1, FEBRUARY 2012
88 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 1, FEBRUARY 2012 Semantic Model Vectors for Complex Video Event Recognition Michele Merler, Student Member, IEEE, Bert Huang, Lexing Xie, Senior Member,
More informationUCF-CRCV at TRECVID 2014: Semantic Indexing
UCF-CRCV at TRECVID 214: Semantic Indexing Afshin Dehghan 1, Mahdi M. Kalayeh 1, Yang Zhang 1, Haroon Idrees 1, Yicong Tian 1, Amir Mazaheri 1, Mubarak Shah 1, Jingen Liu 2, and Hui Cheng 2 1 1 Center
More informationPreliminary Local Feature Selection by Support Vector Machine for Bag of Features
Preliminary Local Feature Selection by Support Vector Machine for Bag of Features Tetsu Matsukawa Koji Suzuki Takio Kurita :University of Tsukuba :National Institute of Advanced Industrial Science and
More informationA Comparison of l 1 Norm and l 2 Norm Multiple Kernel SVMs in Image and Video Classification
A Comparison of l 1 Norm and l 2 Norm Multiple Kernel SVMs in Image and Video Classification Fei Yan Krystian Mikolajczyk Josef Kittler Muhammad Tahir Centre for Vision, Speech and Signal Processing University
More informationFish species recognition from video using SVM classifier
Fish species recognition from video using SVM classifier Katy Blanc, Diane Lingrand, Frédéric Precioso Univ. Nice Sophia Antipolis, I3S, UMR 7271, 06900 Sophia Antipolis, France CNRS, I3S, UMR 7271, 06900
More informationRushes Video Segmentation Using Semantic Features
Rushes Video Segmentation Using Semantic Features Athina Pappa, Vasileios Chasanis, and Antonis Ioannidis Department of Computer Science and Engineering, University of Ioannina, GR 45110, Ioannina, Greece
More informationAXES at TRECVid 2012: KIS, INS, and MED
AXES at TRECVid 202: KIS, INS, and MED Dan Oneata, Matthijs Douze, Jérôme Revaud, Schwenninger Jochen, Danila Potapov, Heng Wang, Zaid Harchaoui, Jakob Verbeek, Cordelia Schmid, Robin Aly, et al. To cite
More informationITI-CERTH participation to TRECVID 2012
ITI-CERTH participation to TRECVID 2012 Anastasia Moumtzidou, Nikolaos Gkalelis, Panagiotis Sidiropoulos, Michail Dimopoulos, Spiros Nikolopoulos, Stefanos Vrochidis, Vasileios Mezaris, Ioannis Kompatsiaris
More informationMultimedia Event Detection for Large Scale Video. Benjamin Elizalde
Multimedia Event Detection for Large Scale Video Benjamin Elizalde Outline Motivation TrecVID task Related work Our approach (System, TF/IDF) Results & Processing time Conclusion & Future work Agenda 2
More informationClass 5: Attributes and Semantic Features
Class 5: Attributes and Semantic Features Rogerio Feris, Feb 21, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Project
More informationClass 9 Action Recognition
Class 9 Action Recognition Liangliang Cao, April 4, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Visual Recognition
More informationFisher vector image representation
Fisher vector image representation Jakob Verbeek January 13, 2012 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.11.12.php Fisher vector representation Alternative to bag-of-words image representation
More informationCompressed local descriptors for fast image and video search in large databases
Compressed local descriptors for fast image and video search in large databases Matthijs Douze2 joint work with Hervé Jégou1, Cordelia Schmid2 and Patrick Pérez3 1: INRIA Rennes, TEXMEX team, France 2:
More informationA Text-Based Approach to the ImageCLEF 2010 Photo Annotation Task
A Text-Based Approach to the ImageCLEF 2010 Photo Annotation Task Wei Li, Jinming Min, Gareth J. F. Jones Center for Next Generation Localisation School of Computing, Dublin City University Dublin 9, Ireland
More informationNagoya University at TRECVID 2014: the Instance Search Task
Nagoya University at TRECVID 2014: the Instance Search Task Cai-Zhi Zhu 1 Yinqiang Zheng 2 Ichiro Ide 1 Shin ichi Satoh 2 Kazuya Takeda 1 1 Nagoya University, 1 Furo-Cho, Chikusa-ku, Nagoya, Aichi 464-8601,
More informationTag Recommendation for Photos
Tag Recommendation for Photos Gowtham Kumar Ramani, Rahul Batra, Tripti Assudani December 10, 2009 Abstract. We present a real-time recommendation system for photo annotation that can be used in Flickr.
More informationNotebook paper: TNO instance search submission 2012
Notebook paper: TNO instance search submission 2012 John Schavemaker, Corné Versloot, Joost de Wit, Wessel Kraaij TNO Technical Sciences Brassersplein 2, 2612 CT, Delft, The Netherlands E-mail of corresponding
More informationCS 231A Computer Vision (Fall 2012) Problem Set 4
CS 231A Computer Vision (Fall 2012) Problem Set 4 Master Set Due: Nov. 29 th, 2012 (23:59pm) 1 Part-based models for Object Recognition (50 points) One approach to object recognition is to use a deformable
More informationCategory-level localization
Category-level localization Cordelia Schmid Recognition Classification Object present/absent in an image Often presence of a significant amount of background clutter Localization / Detection Localize object
More informationTagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation
TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek, Cordelia Schmid LEAR team, INRIA Rhône-Alpes, Grenoble, France
More informationMulti-view Facial Expression Recognition Analysis with Generic Sparse Coding Feature
0/19.. Multi-view Facial Expression Recognition Analysis with Generic Sparse Coding Feature Usman Tariq, Jianchao Yang, Thomas S. Huang Department of Electrical and Computer Engineering Beckman Institute
More informationIJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:
Semi Automatic Annotation Exploitation Similarity of Pics in i Personal Photo Albums P. Subashree Kasi Thangam 1 and R. Rosy Angel 2 1 Assistant Professor, Department of Computer Science Engineering College,
More informationComputer Vision. Exercise Session 10 Image Categorization
Computer Vision Exercise Session 10 Image Categorization Object Categorization Task Description Given a small number of training images of a category, recognize a-priori unknown instances of that category
More informationVERGE: A Video Interactive Retrieval Engine
VERGE: A Video Interactive Retrieval Engine Stefanos Vrochidis, Anastasia Moumtzidou, Paul King, Anastasios Dimou, Vasileios Mezaris and Ioannis Kompatsiaris Informatics and Telematics Institute 6th Km
More informationFeatures Preserving Video Event Detection using Relative Motion Histogram of Bag of Visual Words
ISSN 2395-1621 Features Preserving Video Event Detection using Relative Motion Histogram of Bag of Visual Words #1 Ms. Arifa U. Mulani, #2 Ms. Varsha V. Mahajan, #3 Ms. Swati B. Wadghule, #4 Prof. Radha
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.
CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit
More informationLearning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009
Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer
More informationSIFT, BoW architecture and one-against-all Support vector machine
SIFT, BoW architecture and one-against-all Support vector machine Mohamed Issolah, Diane Lingrand, Frederic Precioso I3S Lab - UMR 7271 UNS-CNRS 2000, route des Lucioles - Les Algorithmes - bt Euclide
More informationClassifier Fusion for SVM-Based Multimedia Semantic Indexing
Classifier Fusion for SVM-Based Multimedia Semantic Indexing Stéphane Ayache, Georges Quénot, Jérôme Gensel To cite this version: Stéphane Ayache, Georges Quénot, Jérôme Gensel. Classifier Fusion for SVM-Based
More informationLearning to Recognize Faces in Realistic Conditions
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationImproving Instance Search Performance in Video Collections
Improving Instance Search Performance in Video Collections Zhenxing Zhang School of Computing Dublin City University Supervisor: Dr. Cathal Gurrin and Prof. Alan Smeaton This dissertation is submitted
More informationBy Suren Manvelyan,
By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan,
More informationOBJECT CATEGORIZATION
OBJECT CATEGORIZATION Ing. Lorenzo Seidenari e-mail: seidenari@dsi.unifi.it Slides: Ing. Lamberto Ballan November 18th, 2009 What is an Object? Merriam-Webster Definition: Something material that may be
More informationSEARCHING pictures on smart phones, PCs, and the
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4, AUGUST 2012 1091 Harvesting Social Images for Bi-Concept Search Xirong Li, Cees G. M. Snoek, Senior Member, IEEE, Marcel Worring, Member, IEEE, and Arnold
More informationConsumer Video Understanding
Consumer Video Understanding A Benchmark Database + An Evaluation of Human & Machine Performance Yu-Gang Jiang, Guangnan Ye, Shih-Fu Chang, Daniel Ellis, Alexander C. Loui Columbia University Kodak Research
More informationVideo event detection using subclass discriminant analysis and linear support vector machines
Video event detection using subclass discriminant analysis and linear support vector machines Nikolaos Gkalelis, Damianos Galanopoulos, Vasileios Mezaris / TRECVID 2014 Workshop, Orlando, FL, USA, November
More informationUser Strategies in Video Retrieval: a Case Study
User Strategies in Video Retrieval: a Case Study L. Hollink 1, G.P. Nguyen 2, D.C. Koelma 2, A.Th. Schreiber 1, M. Worring 2 1 Section Business Informatics Free University Amsterdam De Boelelaan 1081a
More informationA System of Image Matching and 3D Reconstruction
A System of Image Matching and 3D Reconstruction CS231A Project Report 1. Introduction Xianfeng Rui Given thousands of unordered images of photos with a variety of scenes in your gallery, you will find
More informationThe AXES submissions at TrecVid 2013
The AXES submissions at TrecVid 2013 Robin Aly, Relja Arandjelovic, Ken Chatfield, Matthijs Douze, Basura Fernando, Zaid Harchaoui, Kevin Mcguiness, Noël O Connor, Dan Oneata, Omkar Parkhi, et al. To cite
More informationLatent Variable Models for Structured Prediction and Content-Based Retrieval
Latent Variable Models for Structured Prediction and Content-Based Retrieval Ariadna Quattoni Universitat Politècnica de Catalunya Joint work with Borja Balle, Xavier Carreras, Adrià Recasens, Antonio
More informationSelection of Scale-Invariant Parts for Object Class Recognition
Selection of Scale-Invariant Parts for Object Class Recognition Gy. Dorkó and C. Schmid INRIA Rhône-Alpes, GRAVIR-CNRS 655, av. de l Europe, 3833 Montbonnot, France fdorko,schmidg@inrialpes.fr Abstract
More informationRanking Error-Correcting Output Codes for Class Retrieval
Ranking Error-Correcting Output Codes for Class Retrieval Mehdi Mirza-Mohammadi, Francesco Ciompi, Sergio Escalera, Oriol Pujol, and Petia Radeva Computer Vision Center, Campus UAB, Edifici O, 08193, Bellaterra,
More informationTeam SRI-Sarnoff s AURORA TRECVID 2011
Team SRI-Sarnoff s AURORA System @ TRECVID 2011 Hui Cheng, Amir Tamrakar, Saad Ali, Qian Yu, Omar Javed, Jingen Liu, Ajay Divakaran, Harpreet S. Sawhney, Alex Hauptmann, Mubarak Shah, Subhabrata Bhattacharya,
More informationToward Retail Product Recognition on Grocery Shelves
Toward Retail Product Recognition on Grocery Shelves Gül Varol gul.varol@boun.edu.tr Boğaziçi University, İstanbul, Turkey İdea Teknoloji Çözümleri, İstanbul, Turkey Rıdvan S. Kuzu ridvan.salih@boun.edu.tr
More informationLarge-scale visual recognition The bag-of-words representation
Large-scale visual recognition The bag-of-words representation Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline Bag-of-words Large or small vocabularies? Extensions for instance-level
More informationCS4670: Computer Vision
CS4670: Computer Vision Noah Snavely Lecture 6: Feature matching and alignment Szeliski: Chapter 6.1 Reading Last time: Corners and blobs Scale-space blob detector: Example Feature descriptors We know
More informationSegmentation as Selective Search for Object Recognition in ILSVRC2011
Segmentation as Selective Search for Object Recognition in ILSVRC2011 Koen van de Sande Jasper Uijlings Arnold Smeulders Theo Gevers Nicu Sebe Cees Snoek University of Amsterdam, University of Trento ILSVRC2011
More informationIPL at ImageCLEF 2017 Concept Detection Task
IPL at ImageCLEF 2017 Concept Detection Task Leonidas Valavanis and Spyridon Stathopoulos Information Processing Laboratory, Department of Informatics, Athens University of Economics and Business, 76 Patission
More informationTHE POPULARITY of the internet has caused an exponential
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 4, APRIL 2011 381 Contextual Bag-of-Words for Visual Categorization Teng Li, Tao Mei, In-So Kweon, Member, IEEE, and Xian-Sheng
More informationCase-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.
CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance
More information